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PREFACE  TO  FOURTH  EDITION 


This  book  was  first  published  in  1901,  and  was  then  based 
on  lectures  delivered  at  the  School  of  Economics  in  the  five 
years  following  its  foundation  in  1895.  Two  further  editions 
have  been  issued  in  which  the  text  was  revised  without  any 
important  alteration,  and  an  Appendix  added  dealing  with  the 
second  approximation  to  the  normal  curve  of  error,  and  sub¬ 
sequently  some  pages  of  addenda  were  circulated.  In  the 
present  edition  Part  I  remains  substantially  as  it  was  in  1901, 
except  that  Section  III  of  Chapter  III  has  been  replaced  by  a 
new  illustration,  the  chapter  on  Averages  has  been  rearranged, 
a  chapter  on  the  measurement  of  dispersion  takes  the  place  of 
the  former  Chapter  V,  in  Chapter  IX  the  treatment  of  retail 
index-numbers  has  been  reconsidered,  and  the  second  section 
of  Chapter  X  has  been  recast.  At  the  same  time  those  parts 
of  the  text  which  were  out  of  date  have  been  replaced  by  more 
modern  material  and  the  whole  has  been  revised,  but  with  as 
little  alteration  of  the  original  as  possible,  since  a  revised 
version  may  by  too  much  attention  to  detail  destroy  the  balance 
of  the  original.  On  the  other  hand,  Part  II  has  been  completely 
rewritten  and  considerably  extended,  both  by  the  more  detailed 
and  extended  treatment  of  theory  and  by  the  addition  of  a 
number  of  examples  which  illustrate  the  arithmetical  use  of 
the  formulae  and  show  the  scope  of  the  application  of  the  theory. 
For  the  convenience  of  those  who  possess  the  earlier  edition, 
to  whom  the  revised  Part  I  contains  little  that  is  new,  Part  II 
is  issued  separately ;  while  for  those  whose  mathematical  know¬ 
ledge  is  too  slight  to  allow  them  to  follow  the  treatment  in 
Part  II  in  its  new  form  Part  I  is  also  issued  separately.  But 
the  two  Parts  together  are  essentially  one  book  with  a  common 
index  and  with  cross  references  from  one  to  another. 

The  whole  book  is  intended  to  form  a  general  introduction 
to  the  theory  and  practice  of  statistics  for  all  persons  whose 
business  it  is  to  handle  them  or  to  whom  a  general  understand¬ 
ing  both  of  the  utility  of  statistical  results  and  the  limitations 
*  v 
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of  statistical  investigation  is  important.  It  is  not  in  any  way 
intended  to  be  a  compendium  of  facts,  and  the  tables  inserted 
are  only  to  afford  illustrations  of  method,  nor  does  it  contain 
any  detailed  account  of  published  statistics ;  but  it  is  hoped 
that  a  reader  will  find  himself  in  a  position  to  understand,  and 
above  all  to  appraise  and  criticise,  tables  and  results  published 
officially  or  otherwise  relating  to  any  of  those  very  numerous 
subjects  in  which  numerical  knowledge  of  facts  and  their  inter¬ 
relation  is  essential.  No  attempt  is  made  to  treat  the  history 
or  bibliography  of  the  subject ;  there  are  many  books  extant 
in  English,  French  and  German  which  devote  considerable 
space  to  the  historical  development  of  the  methods  and  practice 
of  statistics,  with  bibliographical  references ;  it  seemed  better 
here  to  omit  these  aspects  altogether  than  to  give  them  a 
cursory  treatment.  With  these  limitations  it  is  hoped  that 
the  treatment  in  Part  I  covers  adequately  the  great  part  of  the 
methods  and  technique  necessary  for  ordinary  statistical  work 
so  far  as  this  can  be  done  without  the  use  of  any  but  the  most 
elementary  mathematics.  The  chapter  on  Interpolation, 
indeed,  uses  symbols  which  at  first  sight  may  look  formidable 
to  the  non-mathematician ;  but  in  fact  the  use  of  finite 
differences  and  of  Newton’s  formula  of  interpolation  is  quite 
simple  and  the  arithmetic  involved  very  easy,  and  the  great 
part  of  the  chapter  should  be  readily  intelligible  to  those  who 
have  a  school  training  in  graphic  algebra. 

Part  II  makes  much  greater  demands  both  on  preliminary 
training  and  on  the  power  of  following  somewhat  involved 
abstract  reasoning.  The  actual  knowledge  postulated  is  that 
obtainable  in  a  graduate  course  on  the  calculus,  and  the  only 
theorems  not  generally  included  in  such  a  course  are  proved 
(in  an  abbreviated  form)  in  the  Appendix.  In  the  first  edition 
an  effort  was  made  to  obtain  the  principal  results  without  the 
use  of  the  Calculus ;  but  as  the  subject  has  developed  during 
the  past  twenty  years,  it  has  become  necessary  to  abandon  this 
attempt.  The  results  that  can  be  reached  by  algebra  alone 
are  no  doubt  important  and  useful,  but  there  is  so  much  of  at 
least  equal  utility  that  can  only  be  appreciated  after  more 
advanced  mathematical  study  that  a  student  will  save  time 
in  the  end  by  becoming  familiar  with  the  elements  of  the 
infinitesimal  calculus  before  he  commences  the  serious  study  of 
mathematical  statistics.  This  opinion  is  confirmed  by  the  very 
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loose  reasoning  often  employed  by  writers  who  make  too  facile 
use  of  the  standard  deviation,  of  curves  of  frequency  and 
especially  of  the  coefficient  of  correlation.  Very  great  care 
has  been  taken  in  Chapter  VI,  Part  II,  to  show  as  exactly  as 
possible  the  meaning  of  the  measurement  of  correlation  by  this 
coefficient  and  its  implications,  and  very  much  more  might 
have  been  said  before  the  subject  was  too  thoroughly  explored. 
No  one  should  attempt  to  measure  correlation  till  he  has 
studied  the  theory  closely  and  critically. 

Though  the  treatment  in  Part  II  is  intended  to  serve  as  a 
general  introduction  to  mathematical  statistics  whatever  the 
subject-matter  to  which  they  are  applied  and  to  include  defini¬ 
tions  and  explanations .  of  the  terms  and  measurements  in 
common  use,  so  as  to  be  of  assistance  to  students  in  all  branches 
of  science  that  involve  group  measurements,  yet  the  order  of 
treatment  and  in  particular  the  worked  examples  are  chosen 
principally  with  reference  to  the  problems  that  arise  in  socio¬ 
logical  and  economic  investigations,  many  of  the  examples  in 
fact  being  taken  from  researches  I  have  personally  made  in 
which  mathematical  treatment  was  only  introduced  so  far  as 
the  line  of  inquiry  called  for  it.  In  consequence  of  this  the 
reader  who  is  familiar  with  the  writings  of  Professor  Karl 
Pearson,  Mr.  Elderton,  Mr.  Hardy,  Mr.  Yule  and  Dr.  Green¬ 
wood  will  notice  that  little  emphasis  is  laid  on  applications  to 
biological  or  to  actuarial  problems,  while  prominence  is  given 
to  formulae  and  to  methods  which  have  received  less  attention. 

It  is  unfortunately  the  case  that  a  great  deal  of  controversy 
has  arisen  with  reference  not  only  to  the  best  methods  of  treat¬ 
ment,  but  also  to  the  fundamental  conceptions  that  underlie 
the  application  of  the  principles  of  mathematical  probability  to 
statistical  observations.  I  cannot  hope  to  have  avoided  con¬ 
troversial  questions  (for,  indeed,  if  these  were  rigidly  excluded 
there  would  be  little  left),  but  I  have  endeavoured  to  put  in  the 
foreground  those  methods  and  principles  which  command 
general  acceptance  and  to  omit  those  which  are  the  subject  of 
dispute  and  are  unessential.  In  one  respect,  however,  a  definite 
course  is  followed  which  will  not  meet  with  universal  approval ; 
in  my  opinion  the  standard  deviation  has  only  limited  utility 
unless  it  is  connected  with  a  table  of  probability  by  which  the 
chances  of  exceeding  given  multiples  of  this  deviation  can  be 
calculated,  and  consequently  I  have  emphasised  the  normality 
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of  the  distribution  of  averages  and  other  functions  arising  from 
samples.  In  the  same  connection  it  is  necessary  to  proceed 
from  direct  probability  to  so-called  inverse  probability,  and 
here  I  have  gone  further  than  many  writers.  If  we  are  judging 
a  universe  from  a  sample,  we  have  not  arrived  at  any  definite 
result  till  we  can  make  such  a  statement  as  “  the  most  probable 
average  (or  whatever  may  be  the  quantity  in  question)  in  the 
universe  is  A,  and  the  chances  that  the  average  differs  from  A 
by  dv  d2  .  .  .  are  pv  p2  .  .  ;  this  involves  a  definite  use  of 

inverse  probability  and  of  a  table  of  probability.  A  mere 
statement  of  the  standard  deviation  tells  us  very  little. 

It  will  be  evident  that  there  is  little  that  is  wholly  original 
in  the  theorems  or  formulae  in  this  book,  and  that  I  am  indebted 
especially  to  Professor  Edgeworth,  Professor  Karl  Pearson, 
Dr.  Sheppard,  and  to  Mr.  Yule  again  and  again.  I  have  hesi¬ 
tated,  however,  to  attribute  explicitly  to  these  writers  all  that 
I  owe  to  them,  for  the  order  of  treatment  followed  has  frequently 
made  it  necessary  to  modify  and  adjust  their  work  in  a  way 
which  would  not  always  command  their  assent ;  but  I  have 
endeavoured  to  acknowledge  the  sources  from  which  I  have 
drawn  and  to  give  such  references  as  will  enable  the  reader  to 
study  the  originals. 

Mr.  J.  P.  Cl  at  worthy  (of  University  College,  Reading),  Mr. 
H.  Curwen  and  Miss  M.  Hogg  (of  the  School  of  Economics), 
and  Mr.  Menzler  have  rendered  much  assistance  in  correcting 
the  proofs  of  Part  II  and  in  verifying  parts  of  the  arithmetic ; 
and  to  the  first-named  I  am  much  indebted  for  valuable 
criticism  of  the  detail  of  the  mathematical  analysis. 

A.  L.  Bowley. 

London  School  of  Economics  and  Political  Science. 

September  1920. 
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CHAPTER  I. 

SCOPE  AND  MEANING  OF  STATISTICS. 


Very  many  definitions  have  been  given  of  the  word  statis¬ 
tics ,  and  each  author  who  has  written  on  the  subject  has 
assigned  new  limits  to  the  field  which  should  be  Definitions  of 
included  in  its  scope.  It  will  not  be  necessary  statistics, 
for  the  purpose  of  this  book  to  discuss  the  merely  verbal 
differences  involved,  but  only  to  explain  what  is  intended  by 
its  title,  and  to  consider  the  limits  of  the  science  which  it  is 
proposed  to  investigate.  It  will  be  useful,  however,  to  mention 
some  possible  definitions. 

Statistics  may,  for  instance,  be  called  the  science  of  counting. 
Counting  appears  at  first  sight  to  be  a  very  simple  operation, 
which  any  one  can  perform  or  which  can  be  done  The  science  of 
automatically;  but,  as  a  matter  of  fact,  when  we  counting, 
come  to  large  numbers,  e.g.,  the  population  of  the  United  King¬ 
dom,  counting  is  by  no  means  easy,  or  within  the  power  of  an 
individual ;  limits  of  time  and  place  alone  prevent  it  being  so 
carried  out,  and  in  no  way  can  absolute  accuracy  be  obtained 
when  the  numbers  surpass  certain  limits.  Great  numbers  are 
not  counted  correctly  to  a  unit,  they  are  estimated;  and  we 
might  perhaps  point  to  this  as  a  division  between  Distinction 
arithmetic  and  statistics,  that  whereas  arithmetic  between  statistics 
attains  exactness,  statistics  deals  with  estimates,  and  anthmetlc- 
sometimes  very  accurate,  and  often  sufficiently  so  for  their 
purpose,  but  never  mathematically  exact.  Statistics  generally 
relate  to  numbers  so  great  that  their  estimation  is  beyond  the 
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power  of  an  individual,  and  requires  the  co-operation  of 

statistics  an  organised  body  of  workers.  Though  the 
as  co-operative  collection  of  numbers  by  several  persons  and 
the  mere  addition  of  the  results  seem  simply 
questions  of  arithmetic,  yet  in  practice  two  difficulties  soon 
occur.  First,  it  is  not  easy  to  define  the  thing  to  be  counted 
so  explicitly  that  all  the  tellers  shall  admit  and  reject  instances 
on  the  same  principles;  for  such  simple  objects  as  the  number 
of  rooms  or  stories  of  a  house,  a  person’s  age,  even  an  indi¬ 
vidual,  give  rise  to  such  complex  questions  of  definition  that 
it  is  often  impossible  to  tell  from  a  short  description  of  a 
category  exactly  what  items  are  included  in  it.  Secondly, 
numerical  errors  cannot  be  avoided  when  many  workers  are 
involved;  for  some  among  a  large  number  of  persons  will  be 
inaccurate,  some  unintelligent,  some  will  not  obtain  complete 
information,  and  when  their  reports  are  compiled  there  will  be 
occasional  mistakes  in  copying  and  errors  in  tabulation.  A 
total  which  is  the  result  of  the  work  of  many  hands  will  cer¬ 
tainly  from  one  cause  or  another  fall  short  of  complete  accuracy. 
But  though  all  estimates  of  this  nature  are  sometimes  included 
under  the  term  statistics ,  this  definition  at  once  is  too  wide, 
and  also  does  not  bring  out  the  distinctive  nature  of  statistical 
method. 

It  is  better,  in  fact,  to  define  statistics  a  posteriori.  In 
dealing  with  masses  of  figures,  large  numbers  descriptive  of 
statistics  as  a  groups,  series  of  totals  or  averages  relating  to 
method.  different  dates  or  places,  it  is  found  that  special 

methods  become  necessary — methods  which  depend  on  par¬ 
ticular  properties  of  large  numbers,  methods  which  are  suitable 
for  describing  complex  groups  so  that  they  can  be  easily  com¬ 
prehended,  methods  for  analysing  the  accuracy  of  statements, 
for  measuring  the  significance  of  differences,  for  comparing  one 
estimate  with  another.  Those  estimates  to  which  these 
methods  apply  are  within  the  scope  of  statistics;  it  is  the 
study  of  these  methods  that  is  the  object  of  this  book.  It  is 
clear  that,  under  our  tentative  definition,  statistics  is  not 
Generality  of  merely  a  branch  of  political  economy,  nor  is  it 
statistical  confined  to  any  one  science.  A  knowledge  of 
statistics  is  like  a  knowledge  of  foreign  languages 
or  of  algebra  :  it  may  prove  of  use  at  any  time  under  any 
circumstances.  \J 
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It  may  be  interesting  to  trace  the  connection  of  statistical 
method  with  various  branches  of  knowledge.  To  begin  with 
the  physical  sciences  :  there  are  two  points  in  i*  use  in  the 
which  this  method  touches  astronomy.  The  physical  sciences, 
method  of  least  squares  was  introduced  by  an  astronomer, 
anxious  to  choose  the  best  of  several  slightly  discrepant 
observations  of  the  position  of  a  star.  In  most  physical 
observations  several  measurements  are  taken  of  the  same 
quantity,  and  it  is  found  that,  however  carefully  they  are 
made,  they  never  absolutely  agree ;  just  as  the  averages 
obtained  by  different  statisticians  from  the  same  series  of 
sociological  observations  are  generally  not  identical.  From 
such  a  group  of  measurements  it  is  necessary  to  deduce  the 
most  probable  estimates ;  this  is  done  by  the  application  of 
the  law  of  error,  in  the  form  of  the  method  of  least  squares. 

The  other  point  of  resemblance  of  statistical  to  astronom¬ 
ical  method  is  common  also  to  geology  and  to  most  applied 
sciences.  The  course  of  scientific  measurement  progressive 
has  generally  been  to  take  first  a  rough  observation  accuracy, 
of  a  quantity,  such  as  the  distance  of  the  sun,  the  thickness  of 
a  stratum,  the  atomic  weight  of  an  element,  the  specific  gravity 
of  a  substance;  then,  as  information  accumulated,  as  the 
precision  of  instruments  increased  and  methods  were  better 
adapted,  to  make  the  measurement  gradually  more  and  more 
accurate.  It  is  important  to  appreciate  this  development, 
for  in  the  present  state  of  our  knowledge,  many  statistical 
measurements  cannot  be  made  with  precision  for  want  of  data, 
and  a  critic  is  inclined  to  say  that  for  this  reason  preliminary 
estimates  are  valueless ;  but  from  the  scientific  point  of  view 
this  criticism  is  wrong,  for  a  faulty  measurement  made  on 
logical  principles  is  better  than  none,  if  limits  can  be  assigned 
to  its  possible  error,  and  may  lead  to  others  with  progressive 
improvement. 

Passing  by  the  general  resemblance  of  statistical  investi¬ 
gations  to  all  scientific  experiments,  we  may  notice  the  use  of 
statistics  in  biology.  It  was,  perhaps,  not  recog-  statistics  and 
nised  before  the  publication  of  Professor  Karl  biology. 
Pearson’s  investigations,*  that  the  whole  doctrine  of  evolution 
and  heredity  rests  in  reality  on  a  statistical  basis.  It  is  in 

*  See  The  Grammar  of  Science,  1900.  chap.  x.  sea.,  and  the  references 
there  given. 
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this  direction  that  some  of  the  most  important  new  work  in 
mathematical  statistics  is  being  done.  It  may  be  worth  while 
to  sketch  very  briefly  the  nature  of  the  problem.  Out  of  a 
great  number  of  observations,  say  the  measurements  of  the 
heights  of  a  group  of  men,  the  type  is  found — an  average, 
about  which  all  the  measurements  are  grouped  according  to 
some  definite  law.  The  problem  is  then  to  determine  whether 
this  type  or  the  grouping  about  it  changes,  and  in  what  way. 
The  differences  found  in  successive  generations  form  the  data 
on  which  arguments  as  to  evolution  and  development  are 
founded.  The  method  applies  equally  to  fossil  remains,  to 
zoological  species,  and  to  many  other  groups.  If  it  is  neg¬ 
lected,  many  valid  arguments  lose  a  great  part  of  their  force, 
and  theories  are  founded  on  personal  impressions  of  phenomena 
instead  of  on  scientific  measurement.  The  work  done  in  this 
direction  becomes  of  immediate  use  to  the  student  of  social 
questions.  The  average  wage  and  the  grouping  about  it  and 
the  change  in  these  quantities  present  precisely  similar  prob¬ 
lems  ;  the  correlation  between  the  effects  of  different  factors 
are  calculated  by  the  same  mathematical  formulae ;  in  fact, 
these  methods  furnish  the  only  accurate  way  of  measuring 
numerical  changes  in  complex  groups.  Much  valuable  infor¬ 
mation  has  been  collected  in  anthropometrical  laboratories, 
which  has  increased  the  statistician’s  knowledge  of  facts  and 
given  birth  to  important  theoretical  principles. 

Meteorology  has  much  in  common  with  statistics.  The 
chief  measurements  taken  for  the  purposes  of  this  science  are 
statistics  and  of  temperature,  barometrical  pressure,  moisture 
meteorology.  0f  air>  anq  force  of  the  wind.  One  of  the 

problems  attacked  is  again  that  of  finding  the  type  from  a 
group  of  observations,  and  of  measuring  its  change.  The 
tables  which  state  the  average  temperature  year  by  year  are 
in  many  ways  similar  to  those  which  the  Registrar-General 
publishes  of  births,  deaths,  and  marriages.  Without  the  aid 
of  statistical  method,  the  averages  obtained  show  mere  numbers 
from  which  no  logical  deductions  can  be  made.  With  the 
help  of  this  knowledge,  it  can  be  seen  whether  the  change  from 
year  to  year  is  significant  or  accidental ;  whether  the  figures 
show  a  progressive  or  periodic  change  ;  whether  they  obey  any 
law  or  not.  The  problem  is  easily  seen  to  be  of  importance  for 
forecasting  the  future  population  and  for  many  similar  purposes. 
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We  are  thus  brought  by  a  short  step  to  the  province  to 
which  statistics  has  sometimes  been  confined  :  the  study  of 
demography.  If  in  demography  we  include,  not  statistics  and 
merely  the  measurement  of  the  numbers  of  the  demography, 
population,  the  birth,  marriage,  and  death  rates,  the  dis¬ 
tribution  by  age,  by  sex,  and  by  locality,  in  fact,  the  figures 
which  naturally  come  from  the  census  and  the  Registrar- 
General’s  returns ;  but  include  also,  industrial  and  social 
measurements,  of  distribution  of  the  population  by  trade,  of 
income,  wages,  prices,  production,  foreign  trade,  transport, 
and  so  forth ;  we  have  extended  the  limits  of  demography  till 
it  includes  the  majority  of  the  statistical  investigations  directly 
interesting  to  students  of  sociology  or  of  political  economy. 
Without  stopping  to  decide  the  exact  limits  of  demography, 
we  can  quickly  pass  to  another  definition  of  statistics  (so  far 
as  it  concerns  such  students)  on  which  it  is  wished  to  lay  a 
certain  stress  :  statistics  is  the  science  of  the  measurement  of  the 
social  organism ,  regarded  as  a  whole,  in  all  its  manifestations. 
In  a  monograph,  after  the  fashion  of  Le  Play,  a 
single  family  is  studied;  the  occupations  and  ^the^odlj16 
earnings  of  its  members,  the  way  these  earnings  orga^Ieasa 
are  spent,  and  its  economic  position  generally  are 
set  down ;  but  this  study  is  not  so  far  statistical.  In  demo¬ 
graphy  we  study  the  same  quantities  when  groups  of  families 
are  concerned;  the  number  of  families  engaged  in  certain 
industries,  and  their  average  receipts,  expenditure,  and  savings  ; 
here  we  have  statistics.  In  the  monographic  method  the  indi¬ 
vidual  is  everything ;  in  the  statistical  method,  nothing.  When 
we  wish  to  obtain  a  measurement  of  the  group,  peculiarities 
of  individuals  receive  no  attention ;  it  is  only  when  the  same 
peculiarities  are  possessed  by  many  persons  that  they  become 
of  importance.  Statistics  may  rightly  be  called  the  science  of 
averages.  In  the  measurement  of  a  complex  group,  say  of 
incomes  and  wages,  the  exceptional  artiste  who  can  earn  £100 
in  an  evening,  and  the  inefficient  labourer  who  can  only  make 
sixpence  a  day,  affect  only  slightly  the  general  average ;  they 
are  not  entered  in  separate  categories ;  but  the  large  group  of 
skilled  artisans  who  earned  before  1914  forty  shillings  a  week,*' 
or  of  casual  labourers  who  made  less  than  fifteen  shillings,  are 
entitled  to  separate  notice.  The  exact  specification  to  be 
adopted  is  only  a  question  of  degree,  which  differs  with  the 
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nature  of  the  particular  investigation  in  hand.  'The  object  of 
a  statistical  estimate  of  a  complex  group  is  to  present  an 
outline,  to  enable  the  mind  to  comprehend  with  a  single  effort 
the  significance  of  the  whole.  To  do  this  it  is  necessary  to 
exclude  rigorously  any  presentation  of  details,  for  the  same 
reason  that,  in  a  painter’s  rendering  of  a  tree,  the  individual 
leaves  are  not  distinguished.  The  outline  will  be  a  little 
blurred,  a  little  inaccurate ;  but  it  will  be  as  distinct  and 
detailed  as  the  mind  has  power  to  grasp  it,  or  the  eye  to  see 
it ;  the  impression  will  be  rightly  given.  There  is  a  very 
important  principle  involved  in  this  method.  The  individual 
members  of  a  group  vary  continually,  the  whole  group  varies 
very  slowly.  It  is  impossible  to  follow  or  measure  the  motions 
of  separate  atoms  ;  it  is  comparatively  easy  to  state  the  laws  of 
motion  for  a  solid  body.  Great  numbers  and  the  averages 
resulting  from  them,  such  as  we  always  obtain  in  measuring 
social  phenomena,  have  great  inertia.  The  total  population, 
the  total  income,  the  birth  and  death  rates,  average  wages, 
change  very  little;  similar  quantities  relating  to  a  single 
family  change  very  fast.  It  is  this  constancy  of  great  numbers 
that  makes  statistical  measurement  possible.  It  is  to  great 
numbers  that  statistical  measurement  chiefly  applies. 

The  relation  of  statistics  to  political  economy  is  a  simple 
one.  Professor  Marshall  says,*  “  Statistics  are  the  straw  out 
,  of  which  I,  like  every  other  economist,  have  to 

Statistics  and  >  J 

political  make  the  bricks.”  The  statistician  furnishes  the 

economy.  political  economist  with  the  facts,  by  which  he 
tests  his  theories  or  on  which  he  bases  them.  Since  the  econo¬ 
mist  deals  chiefly  with  phenomena  relating  to  groups,  and 
regards  the  individual  only  as  a  member  of  a  group,  it  is  to 
statistics  as  the  science  of  averages  that  he  looks  for  his  in¬ 
formation.  When  he  is  dealing  with  national  economy,  with 
the  volume  of  trade,  for  instance,  or  the  purchasing  power  of 
money,  he  is  limited  to  pure  theory,  till  statistics  as  the  science 
of  great  numbers  has  provided  the  facts.  The  chemist  experi¬ 
menting  in  his  laboratory  is  like  the  statistician ;  the  chemist 
theorising  in  his  study  is  like  the  economist.  Because  of  this 
relation  it  may  be  held  to  be  the  business  of  the  statistician  to 
collect,  arrange,  and  describe,  like  a  careful  experimentist,  but 


*  Evidence  to  the  Committee  on  the  Census,  1890. 
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to  draw  no  deductions;  even  in  an  investigation  relating  to 
cause  and  effect,  to  present  evidence  but  not  conclusions.  As 
a  distinct  operation,  of  course,  the  statistician  may  assume 
the  role  of  the  economist,  for  the  same  man  may  well  be  quali¬ 
fied  to  conduct  the  experiment  and  fit  the  theory.  And  just 
as  a  theoretical  chemist  will  have  little  or  no  power  unless  he 
fully  appreciates  experimental  methods  and  difficulties,  even  if 
he  has  not  the  manual  dexterity  to  conduct  them  to  perfection 
himself,  so  no  student  of  political  economy  can  pretend  to 
complete  equipment  unless  he  is  master  of  the  methods  of 
statistics,  knows  its  difficulties,  can  see  where  accurate  figures 
are  possible,  can  criticise  the  statistical  evidence,  and  has  an 
almost  instinctive  perception  of  the  reliance  that  he  may 
place  on  the  estimates  given  him. 

The  proper  function,  indeed,  of  statistics  is  to  enlarge  indi¬ 
vidual  experience.  An  individual  is  limited  to  what  he  can 
himself  see,  a  very  small  part  of  one  division  of  „  .  . 

J  x  Statistics  versus 

the  social  organism ;  his  knowledge  is  extended  in  individual 
various  ways,  by  the  conversation  of  his  acquaint-  esPenence- 
ance,  by  newspaper  reports,  by  the  writings  of  experts.  Accord¬ 
ing  to  his  ability  and  power  of  judgment,  he  will  be  able  to 
form  a  correct  view  of  the  numerical  importance  of  groups  of 
persons  and  things ;  but  it  is  in  the  highest  degree  improbable 
that  he  will  not  have  been  biassed  by  the  peculiarities  of  his 
position,  and  that  he  will  place  his  different  items  of  informa¬ 
tion  in  the  right  perspective ;  and  he  will  not  be  able  to  gauge 
rightly  the  accuracy  of  his  data.  As  soon  as  he  begins  to 
examine  these  points  he  is  undertaking  a  statistical  investiga¬ 
tion,  and  will  very  soon  find  himself  involved  in  all  the  diffi¬ 
culties  and  problems  from  which  a  knowledge  of  statistical 
method  alone  can  disentangle  him.  This  is  the  obvious 
answer  to  those  who  deny  the  use  of  statistics.  A  statistical 
estimate  may  be  good  or  bad,  accurate  or  the  reverse ;  but  in 
almost  all  cases  it  is  likely  to  be  more  accurate  than  a  casual 
observer's  impression,  and  in  the  nature  of  things  can  only 
be  disproved  by  statistical  methods. 

A  chief  practical  use  of  statistics  is  to  show  relative  impor¬ 
tance,  the  very  thing  which  an  individual  is  likely  to  misjudge. 
Statistics  are  almost  always  comparative.  The  statistics  are 
absolute  magnitude  of  a  quantity  is  of  little  comparative, 
meaning  to  us  till  we  have  some  similar  quantity  with  which 
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to  compare  it.  A  statement  of  the  number  of  paupers  in  the 
United  Kingdom  is  valueless  unless  we  know  the  total  popu¬ 
lation.  A  statement  of  the  number  of  gallons  of  water  supplied 
per  head  to  the  people  of  East  London  is  of  little  meaning  to 
us  till  we  know  the  quantity  supplied  to  other  towns.  The 
average  wage,  shown  in  the  Wage  Census,  does  not  convey  its 
full  significance  till  we  have  similar  computations  for  other 
countries  or  relating  to  other  years.  In  the  case  of  most 
statistical  estimates,  it  will  be  found  that  we  need  another  for 
comparison  before  we  can  appreciate  the  meaning  of  the  first. 

If  the  group  of  objects  which  we  wish  to  measure  is  large, 
its  enumeration  will  be  beyond  our  unassisted  efforts,  or  those 
official  of  any  organisation  at  our  command.  Some 

statistics :  investigations,  indeed,  have  been  successfully 

conducted  by  private  organisations,  for  instance,  those  which 
resulted  in  Booth’s  Life  and  Labour  of  the  People ,  Leone  Levi’s 
Wages  and  Earnings,  and  Rowntree’s  Poverty  ;  and  the  method 
of  samples  has  also  been  used  [e.g.,  in  Livelihood  and  Poverty , 
by  the  present  author  and  Burnett-Hurst)  to  reduce  an  inquiry 
to  manageable  dimensions ;  but  in  general  the  measurement 
of  a  part  of  the  social  body  or  industrial  organism  must  be 
undertaken  by  the  central  or  local  governments,  if  it  is  to  be 
successfully  carried  out.  The  fact  that  this  is  the  case  explains 
the  heterogeneity  and  the  imperfection  of  the  mass  of  statistics 
extant.  A  government  primarily  collects  numerical  informa¬ 
tion  only  in  relation  to  its  own  functions.  Thus  the  administra¬ 
tion  must  know  the  numbers  of  the  population  and  the  area  of 
the  country  in  gross  and  in  detail  for  its  own  purposes.  Large 
groups  of  figures  come  simply  from  the  necessity  of  public 
account-keeping.  Many  official  figures  are  bye-products ; 
for  office  purposes  an  account  is  kept  of  all  transactions  in 
which  the  government  has  a  hand,  and  of  industries  subject 
to  special  regulations ;  and  the  government  publishes  most 
of  the  figures  which  thus  come  in  its  way.  To  such  causes 
have  been  due  our  knowledge  of  the  statistics  of  income, 
education,  imports,  railways,  mines,  factories,  and  so  on. 
Though  few  figures  are  collected  simply  for  scientific  purposes, 
yet  in  many  cases  schedules  issued  for  administrative  ends  arc 
used  at  the  same  time  for  the  reception  of  other  information, 
of  use  chiefly  to  the  sociological  student ;  much  of  the  Census 
information  comes  under  this  heading.  A  view  of  those 
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figures,  relating  to  the  United  Kingdom,  which  are  easily 
accessible  to  the  student,  can  be  obtained  by  turning  through 
the  annual  Statistical  Abstract  for  the  United  Kingdom ,  the 
Annual  Abstract  of  Labour  Statistics,  and  the  Registrar -General1  s 
Annual  Report ;  in  one  or  other  of  these,  summaries  of,  and 
references  to,  most  official  statistics  are  to  be  found. 

It  is  clear  that  figures  collected  simply  in  connection  with 
administrative  purposes  are  not  likely  to  be  precisely  those 
which  are  needed  by  the  student  of  sociology  or  their 
political  economy.  Even  where  the  wants  of  the  incomPleteness- 
official  and  the  student  are  nearly  identical,  the  classification 
and  tabulation  may  not  meet  scientific  requirements.  There 
has,  indeed,  been  considerable  progress  in  recent  years,  in 
the  direction  of  amassing  statistical  information  not  absolutely 
needed  by  the  administration,  and  much  of  the  work  of  the 
Labour  Department  of  the  Board  of  Trade  (now  merged  in 
the  Ministry  of  Labour)  was  of  this  kind ;  but  very  much  more 
might  reasonably  be  done,  at  an  expense  which  would  be  almost 
negligible  when  considered  in  relation  to  the  national  income. 
Thus  the  census  might  be  made,  in  part  at  least,  quinquennial, 
and  the  body  of  workers,  who  are  organised  once  in  ten  years 
to  conduct  it,  only  to  be  disbanded  when  the  report  is  issued, 
might  be  made  permanent  and  entrusted  with  the  carrying  out 
of  other  inquiries  on  a  national  scale.  Market  and  retail 
prices  of  many  staple  commodities  could  be  tabulated,  ana¬ 
lysed  and  published.  Movements  of  goods  by  rail  could  be 
tabulated  in  the  same  way  as  transport  by  water,  and  the 
anomaly  that  we  know  more  of  our  foreign  than  of  our  home 
trade  be  removed.  Records  of  home  production  need  not  be 
confined  to  agriculture,  mining,  and  steel  works,  but  extended 
on  the  lines  of  the  Census  of  Production  of  1907  till  we  know 
every  year  the  output  of  the  principal  industries.  Above  all 
a  central  statistical  office  is  needed  which  should  co-ordinate 
all  existing  statistics  and,  working  directly  or  through  the 
appropriate  Departments,  aim  at  completing  and  perfecting 
a  continuous  statistical  account  of  the  nation.  It  needs  very 
little  study  of  statistics  or  of  political  economy  central 
to  feel  the  pressing  need  of  more  and  better  statistical  office. 

co-ordinated  information ;  illustrations  of  the  gaps  in  our 
knowledge  are  easily  found.  When  dealing  with  our  national 
income  we  can  obtain  statistics  of  wages,  and  of  income  subject 
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to  tax ;  but  for  salaries  below  the  exemption  limit,  and  for 
part  of  the  income  received  from  foreign  investments,  we  are 
forced  to  rely  on  educated  guesses.  For  the  change  of  the 
purchasing  power  of  money  we  know,  thanks  chiefly  to  the 
Economist  and  trade  newspapers,  the  course  of  wholesale 
prices,  but  many  interesting  calculations  are  brought  to  a 
standstill  because  of  the  imperfection  of  the  records  of  retail 
prices.  With  regard  to  wages,  we  can  estimate  fairly  accur¬ 
ately  standard  and  average  wages,  but,  in  default  of  an 
industrial  census,  do  not  know  how  many  persons  are  in 
receipt  of  each  given  wage,  nor  the  relative  numbers  of  masters 
and  men.  Till  there  is  a  public  demand  for  such  information, 
it  will  need  a  very  enlightened  government  to  spare  the  time, 
trouble,  and  the  relatively  small  sums  of  money  necessary 
for  a  systematic  attempt  to  fill  up  these  gaps ;  but  every  one 
can  do  something  towards  this  enlightenment,  and  in  further¬ 
ance  of  this  demand,  by  studying  what  has  been  done  in  other 
countries,  and  building  up  a  knowledge  of  the  science  of 
statistical  investigation. 

The  absence  of  such  a  demand  is  perhaps  due  to  a  widely 
spread  and  not  unreasonable  distrust  of  statistical  estimates. 

Distrust  of  crystallised  in  the  common  remark  that  “  any- 

statistics:  thing  can  be  proved  by  statistics.”  This  is  to  a 

great  extent  the  fault  of  the  criticising  public  themselves  : 
they  are  always  requiring  and  the  newspapers  always  supplying 
information,  which  depends  on  a  statistical  basis,  but  for 
which  good  statistics  are  not  to  be  found  for  one  or  other  of 

the  reasons  already  indicated.  The  informant 

its  causes.  J 

must  perforce  turn  to  inaccurate  estimates,  and 
the  public  has  no  knowledge  or  discrimination  as  to  what 
estimates  rest  on  satisfactory  data,  or  indeed  as  to  what 
quantities  are  capable  of  statistical  evaluation.  Again,  figures 
which  cover  only  part  of  the  subject,  such  as  the  Wage  Census 
average,  or  the  Labour  Gazette  returns  of  unemployed,  may  be 
quoted  as  universal ;  mere  estimates,  made  for  quite  other 
purposes,  may  be  given  as  accurate  and  complete ;  and  on 
such  unreliable  premises  arguments  are  based,  which  naturally, 
by  a  judicious  choice  of  material,  can  be  made  to  support  any 
theory  at  pleasure.  It  will  generally  be  found  that  the  statis¬ 
tician,  on  whose  authority  such  statements  are  supposed  to 
be  based,  is  not  to  blame.  Some  of  the  common  ways  off 
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producing  a  false  statistical  argument  are  to  quote  figures 
without  their  context,  omitting  the  cautions  as  to  their  incom¬ 
pleteness,  or  to  apply  them  to  a  group  of  phenomena  quite 
different  to  that  to  which  they  in  reality  relate;  to  take  j 
estimates  referring  to  only  part  of  a  group  as  complete;  to 
enumerate  the  events  favourable  to  an  argument,  omitting 
the  other  side ;  and  to  argue  hastily  from  effect  to  cause,  this  f 
last  error  being  the  one  most  often  fathered  on  to  statistics. 
For  all  these  elementary  mistakes  in  logic,  statistics  is  held 
responsible. 

Perhaps  statisticians  themselves  have  not  always  fully 
recognised  the  limitations  of  their  work.  At  best  they  can 
measure  only  the  numerical  aspect  of  a  pheno-  Limitations  of 
menon;  while  very  often  they  must  be  content  statistics. 

with  measuring  not  the  facts  they  wish,  but  some  allied  quan¬ 
tity.  We  wish  to  know,  for  instance,  the  extent  of  poverty, 
its  increase  or  diminution  :  poverty  we  cannot  define  or 
measure,  and  we  cannot  even  count  the  number  of  the  poor; 
all  we  can  do  is  to  state  the  number  of  officially  recognised 
paupers,  and  add  perhaps  some  estimates  from  private  sources ; 
but  this  gives  us  no  clue  to  the  intensity  of  poverty  in  indi¬ 
vidual  cases.  Or  we  wish  to  obtain  statistics  of  health  :  but 
the  principal  measurements  made  are  of  the  death-rate  and 
average  length  of  life,  and  the  prevalence  of  some  diseases, 
very  different  matters.  The  statistician’s  contribution  to  a 
sociological  problem  is  only  one  of  objective  measurement, 
and  this  is  frequently  among  the  less  important  of  the  data; 
it  is  as  necessary,  however,  to  its  solution  as  accurate  measure¬ 
ments  are  for  the  construction  of  a  building. 


CHAPTER  II. 


THE  GENERAL  METHOD  OF  STATISTICAL 

INVESTIGATION . 

At  first  sight  it  will  seem  as  if  there  were  no  method  common 
to  all  statistical  investigations,  and  indeed  the  processes  differ 
so  widely  that  it  is  not  easy  to  outline  a  scheme  which  will 
include  them  all;  but  the  following  sequence  is  generally 
indicated  *  as  of  general  application,  and  will  serve  at  least 
to  thread  an  examination  of  methods  together  :  (i)  the  Collec¬ 
tion  of  Material,  (2)  its  Tabulation,  (3)  the  Summary,  and 
(4)  a  Critical  Examination  of  its  results.  The  first  three 
processes  will  be  discussed  in  detail  in  the  following  chapters. 

It  may  be  well  to  state  what  equipment  is  necessary  for  the 
student  who  wishes  to  learn  statistical  methods.  In  collection 
and  tabulation  common-sense  is  the  chief  requisite, 

knowledge7  and  experience  the  chief  teacher;  no  more  than 
necessary  or  expertness  in  quite  simple  arithmetic  is  neces- 

expedient.  x  x 

sary  for  the  actual  processes ;  but  since,  as  we  shall 
see  immediately,  all  the  parts  of  an  investigation  are  inter¬ 
dependent,  it  is  expedient  to  understand  the  whole  before 
attempting  to  carry  out  a  part.  For  summarising,  it  is  well 
to  have  acquaintance  with  the  various  algebraic  averages,  and 
with  enough  geometry  for  the  interpretation  of  simple  curves, 
though  all  the  operations  can  be  performed  without  the  use  of 
algebraic  symbols.  For  criticism  of  estimates  and  interpre¬ 
tation  of  results,  it  is  necessary  to  use  the  formulae  of  more 
advanced  mathematics,  and  it  is  obviously  expedient  to  under¬ 
stand  the  methods  by  which  these  formulae  are  obtained  to 
ensure  their  intelligent  use.  They  are  specially  necessary  for 
the  comparison  of  complex  groups,  and  for  estimating  the 
significance  of  a  divergence  from  the  average,  or  the  deviations 
in  a  list  of  periodic  figures,  and  quite  essential  in  dealing  with 
correlation. 

*  See,  e.g.,  Dr.  Bertillon’s  Cours  elementaue  de  Statistique,  to  which  the 
present  author  is  indebted  for  some  of  the  treatment  in  the  following  pages. 
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(1)  Information  is  generally  collected  by  issuing  blank 
circulars,  forms  of  inquiry,  to  be  filled  in  either  by  a  few  officials 
or  by  many  individuals,  and  the  proper  drawing  Collection  : 
up  of  this  form  is  one  of  the  chief  tasks  in  a  good  blank  forms ; 
investigation.  Before  this  form  is  issued  it  is  necessary  to 
formulate  a  complete  scheme  of  the  whole  undertaking,  and 
even  to  have  some  idea  of  what  the  resulting  figures  will  be,  so 
as  to  be  able  to  arrange  the  details  of  the  organisation  on  the 
right  scale,  and  adjust  the  tools  used  to  their  purpose.  As 
already  pointed  out,  the  object  whose  measurement  is  wanted 
is  not  in  general  exactly  that  which  can  be  measured,  and  the 
measurable  quantity  nearest  to  it  must  be  found;  e.g.,  when 
the  average  annual  earnings  of  the  working  class  were  in  ques¬ 
tion,  the  quantity  first  measured  was  the  average  weekly  wage. 
Then  some  technical  knowledge  of  the  particular  subject  is 
needed ;  and,  if  not  possessed,  a  preliminary  inquiry  on  a  small 
scale  may  be  necessary  to  show  how  to  fit  means  to  ends. 
The  people  who  possess  the  information  required  must  be 
discovered  and  interrogated  at  first  hand.  The  questions  put 
must  be  those  which  will  yield  answers  in  a  form  ready  for 
tabulation,  and  the  scheme  of  tabulation  must  nature  of  the 
therefore  be  thought  out  beforehand.  The  ques-  questions, 
tions  must  be  so  clear  that  a  misunderstanding  is  impossible, 
and  so  framed  that  the  answers  will  be  perfectly  definite,  such 
as  a  simple  number,  or  “  yes  ”or  “  no.”  They  must  be  such  as 
cannot  give  offence,  or  appear  inquisitorial,  or  lead  to  partisan 
answers,  or  suppression  of  part  of  the  facts.  The  mean  must 
be  found  between  asking  more  than  will  be  readily  answered 
and  less  than  is  wanted  for  the  purpose  in  hand.  The  form 
must  contain  necessary  instructions,  making  mistakes  difficult, 
but  must  not  be  too  complex.  The  exact  degree  of  accuracy 
required,  whether  the  answers  are  to  be  correct  to  shillings  or 
pence,  to  months  or  days,  must  be  decided.  Every  word  and 
every  square  inch  of  space  must  be  keenly  criticised.  A 
little  trouble  spent  upon  the  form  will  save  much  inconvenience 
afterwards. 

(2)  In  considering  what  method  is  to  be  adopted  for  tabula¬ 
tion,  we  must  remember  that  the  investigation  is  intended  to 
furnish  the  answers  to  certain  definite  questions — 

A  Tabulation. 

how  many  people,  what  wage,  what  price — and 

each  column  must  present  some  total  which  is  relevant  to  these 
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questions.  The  exact  scheme  employed  will  differ  in  different 
inquiries.  In  the  population  census,  much  of  the  tabulation  is 
almost  automatic ;  in  the  wage  census,  the  best  and  simplest 
way  to  show  the  grouping  about  the  average  wage  in  each 
occupation  had  to  be  specially  devised;  in  trade  statistics  the 
number  of  different  categories  to  be  adopted  and  the  limits 
of  each  raise  difficult  questions.  In  general,  the  scheme  of 
investigation  requires  knowledge  of  certain  groups ;  and  the 
totals  resulting  from  tabulation  should  show  the  numbers  of 
items  in  these,  so  that  after  tabulation,  instead  of  the  chaotic 
mass  of  infinitely  varying  items,  we  have  a  definite  general 
outline  of  the  whole  group  in  question. 

(3)  When  the  raw  material  is  worked  up  to  this  point,  skill 
of  a  different  kind  is  wanted.  From  the  numbers  obtained,  we 

Averaging  and  have  to  pick  out  the  significant  figures;  so  to 

summarisation.  present  the  totals  and  averages  as  to  give  a 

true  impression  to  an  inquirer;  to  summarise  briefly  the 
information  obtained;  to  concentrate  the  mass  into  a  few 
significant  averages,  and  to  describe  their  exact  meaning  in 
the  fewest  and  clearest  words,  for  it  is  the  result  of  this  con¬ 
centration  which  will  generally  be  used  and  quoted.  To  do 
this  skilfully  requires  an  acquaintance  with  the  method  of 
averages  and  the  use  of  diagrams.  It  may  further  be  necessary 
to  fill  in  unavoidable  gaps  in  the  figures  in  order  to  supply  esti¬ 
mates  for  intermediate  years ;  this  needs  a  study  of  the  danger¬ 
ous  method  of  interpolation.  Finally,  a  verbal  description  of 
the  process,  its  genesis  and  results,  and  an  estimate  of  its 
accuracy  must  be  written,  and  then  the  investigation  is  complete. 

(4)  The  student  who  has  to  make  use  of  statistics  should  not 
be  content  to  take  the  results  of  an  inquiry  on  authority,  but 

criticism  of  ought  to  acquaint  himself  with  all  these  details  of 

results.  method.  Before  the  results  can  be  criticised,  it 
is  necessary  to  know  the  complete  genesis  of  the  figures ; 
whether  the  whole  field  was  covered;  exactly  whence  the 
information  tabulated  was  obtained;  whether  there  was  a 
possibility  of  bias ;  how  nearly  the  individual  answers  were 
correct ;  whether  the  informants  really  knew  the  facts  they 
related,  and  if  they  were  likely  to  state  them  correctly.  The 
published  statement  of  the  results  should  show  clearly  the 
whole  scheme  of  collection  so  as  to  make  this  criticism  possible ; 
in  particular,  specimens  of  the  original  blank  forms  should  be 
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included,  so  that  the  reader  can  judge  whether  the  original 
answers  lead  definitely  and  exactly  to  the  tabulated  results. 
Internal  evidence  often  leads  to  much  useful  criticism.  It 
can  be  seen  whether  the  number  of  returns  for  each  group  is 
proportional  to  its  importance,  or  if  a  specially  important 
figure  depends  on  only  slight  evidence.  The  continuity  of  the 
figures  can  be  examined,  and  the  causes  of  sudden  gaps  in¬ 
vestigated.  The  returns  can  be  divided  into  sample  groups, 
and  the  extent  of  the  correspondence  of  these  groups  with  the 
general  result  will  often  indicate  whether  the  returns  are 
sufficiently  general.  A  careful  study  of  the  more  minute 
tabulations  may  show  within  what  percentage  the  final  numbers 
may  be  expected  to  be  correct.  A  critical  examination  of  this 
kind  will  often  show  that  the  information  obtained  is  insuffi¬ 
cient  to  lead  to  precise  results,  and  then  attention  should  be 
directed  to  estimating  the  magnitude  of  the  effect  of  omissions 
and  inadequacy  of  data. 

A  most  important  function  of  statistics  is  to  produce 
evidence  showing  the  relation  of  one  group  of  phenomena  to 
another ;  for  the  information  obtained  is  presumably  intended 
as  a  guide  for  action,  the  guidance  is  generally  needed  to  show 
what  actions  are  likely  to  produce  certain  desired  effects,  and 
this  is  best  investigated  by  finding  how  such  effects  have  been 
produced  in  the  past.  We  have  then  to  determine  whether 
changes  in  one  measurable  quantity  have  produced  changes 
in  another ;  a  problem  very  often  insoluble,  but  one  on  which 
most  light  can  be  obtained  by  the  study  of  the  relevant  statistics 
in  the  light  of  mathematics,  the  mathematics  of  probability, 
and  it  is  in  this  particular  branch  of  mathematics  that  recent 
statistical  progress  has  been  chiefly  made. 

Such  questions,  however  important,  are  somewhat  abstruse, 
and  presuppose  a  certain  amount  of  technical  knowledge  which 
is  not  in  the  possession  of  the  general  student.  The  plan  of 
this  book  is  to  postpone  all  questions  requiring  such  technical 
or  mathematical  knowledge  to  the  Second  Part,  and  to  confine 
our  earlier  discussions  to  problems  needing  no  special  training 
or  equipment. 


CHAPTER  III. 

DEFINITION  OF  UNIT.  COLLECTION  OF  DATA. 

Preliminary.  Definition  of  Unit. 

Almost  the  first  question  in  the  initiation  of  an  investiga¬ 
tion  is,  What  is  to  be  counted?,  and  nearly  the  last  question 
when  the  tabulation  is  completed  is,  What  has  been  counted  ? 
The  answer  to  the  former  gives  the  preliminary  definition, 
that  to  the  latter  shows  how  it  had  to  be  modified  in  practice. 
The  essential  difficulties  of  definition  come,  first,  from  the 
need  of  interpreting  conceptions  conveyed  or  obscured  by 
ordinary  words  into  entities  capable  of  enumeration,  and, 
secondly,  from  determining  the  things  that  can  actually  be 
counted  which  are  nearest  to  the  entities  of  which  knowledge 
is  desired.  Thus  we  may  be  investigating  overcrowding  or  loss 
Qusesita  and  of  work  through  unemployment.  Overcrowding 
data-  is  expressed  numerically  in  the  relation  between 
persons  and  room  or  air-space,  and  differs  with  the  age  and 
sex  of  the  members  of  the  household  and  the  ventilation  and 
light  of  the  rooms.  In  practice,  persons  only  can  be  counted 
(without  detailed  reference  to  their  needs),  and  the  number 
of  rooms  (a  room  being  defined  rather  arbitrarily)  or  their 
cubic  contents  can  be  recorded.  Loss  of  work  is  expressed 
numerically  in  the  number  of  ordinary  working  days  on  which 
no  paid  work  was  done.  In  practice,  those  are  counted  as 
unemployed  who  satisfy  certain  formalities  at  trade  union  or 
Labour  Exchange  offices,  such  as  signing  a  register  at  a  par¬ 
ticular  hour  each  day.  The  definition  of  “  number  unem¬ 
ployed  ”  depends  on  the  regulations  relating  to  these  registers, 
and  among  unemployed  are  included  only  those  groups  of 
persons  who  come  within  their  scope.  “  Overcrowded  ”  in 
the  usage  of  the  Census  reports  means  that  the  number  of 
persons  enumerated  in  a  tenement  is  more  than  twice  the 
number  of  rooms  in  it,  a  room  being  defined  so  as  to  exclude 
bath-room,  scullery,  etc. 

It  must  be  realised  that  the  words  describing  statistical 
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totals  or  averages,  such  as  population ,  imports ,  tonnage  of  ships 
entered,  average  price,  cost  of  living,  occupied,  wages,  income, 
capital,  are  technical  terms,  whose  significance  is  always  more 
definite  than  that  usual  in  conversation  or  writing,  and  may 
have  some  essential  difference  from  that  in  common  usage. 
These  terms  are  capable  of  exact  definitions,  which  can  only 
be  ascertained  from  the  original  reports  in  which  the  totals 
are  obtained,  and  it  often  happens  that  these  reports  leave 
serious  ambiguities  unsettled.  The  sections  that  follow  in 
this  chapter  illustrate  the  examination  of  the  raw  material 
of  investigations  with  a  view  to  ascertaining  the  exact  meaning 
of  the  totals  obtained. 

It  is  necessary  in  stating  totals  or  averages  to  be  as  explicit 
as  is  possible  without  too  much  verbiage,  and  to  give  definitions 
which  are  too  complex  for  a  simple  heading  in  Expiicitness  in 
juxtaposition  to  the  table  which  contains  them.  statement. 
Thus  in  coal  production  we  should  not  speak  of  “  output  per 
worker,”  but  of  “  number  of  tons  of  coal  brought  to  the  sur¬ 
face  in  the  week  beginning  January  25th,  1920,  in  the  aggregate 
of  the  coal  mines  of  Great  Britain,  divided  by  the  average 
number  of  persons  employed  underground  in  that  week,”  or 
if  this  is  too  complex  all  these  points  should  be  clear  from  the 
context  or  sub-headings  or  foot-notes,  and  an  explanation 
should  explain  how  the  average  number  of  persons  employed 
was  computed. 

A  percentage  should  never  be  given  without  a  phrase 
showing  on  what  it  is  measured.  Thus  if  the  price  of  some 
commodity  was  £80  at  a  previous  date  and  is  £100  now,  the 
increase  is  25  per  cent,  of  its  earlier  price  and  20  per  cent. 
of  its  present  price.  If  it  now  fell  25  per  cent,  of  its  present  price 
it  would  reach  £75  ;  but  if  it  fell  25  per  cent,  of  its  earlier  price, 
it  would  return  to  £80.  If  wages  are  raised  four  times  by 
10  per  cent,  of  a  standard,  starting  at  that  standard,  the  wages 
are  100,  no,  120,  130,  140  per  cent,  of  the  standard;  but 
the  increases  in  each  period  measured  as  percentages  of  the 
wage  at  the  beginning  of  that  period  are  respectively  10,  9*1, 
8*3,  7-7  approximately. 

A  useful  way  of  ensuring  explicitness  in  a  complex  defini¬ 
tion,  of  special  importance  in  schemes  of  tabulation,  can  be 

illustrated  as  follows.  In  a  table  presented  to  Attributes  or 
the  Income  Tax  Commission  we  find  the  sum  characteristics. 

c  2* 


20 


ELEMENTS  OF  STATISTICS 


£1,970,000,000  as  the  total  of  taxable  income,  explanations 
being  given  in  introductory  notes.  The  definition  of  this 
total  may  be  exhibited  thus  : — 

A.  Income. 

B.  Known  to  the  tax  commissioners. 

C.  As  defined  by  the  laws  and  instructions  for  assessment. 

D.  Less  allowances  for  wear  and  tear,  etc. 

E.  Of  persons  and  corporations  in  the  United  Kingdom 

and  of  non-residents  so  far  as  they  are  subject 
to  tax. 

F.  Assessed  for  the  fiscal  year  1918-19. 

Each  of  the  six  phrases  expresses  a  characteristic  or  attribute 
possessed  by  every  unit  in  the  total,  and  the  exact  definition 
of  these  attributes  leads  to  the  definition  of  the  total,  and  to 
the  answer  to  the  question  “  what  has  been  counted?  ” 

We  should  generally  include  as  characteristics,  the  fact  of 
record  (B),  a  date  (F)  and  a  place  (E). 

Section  i. — The  Population  Census. 

The  population  census  will  provide  good  illustrations  of  the 
principles  laid  down  in  the  last  chapter,  both  because  we  shall 
Th  be  at  first  on  familiar  ground,  since  every  one 

knows  its  scheme,  purpose,  and  details,  and  because 
the  form  of  inquiry  used  for  the  collection  of  the  original  data 
brings  out  very  prominently  the  difficulties  met  with  in  detailed 
statistical  investigations. 

The  first  thing  to  be  considered  is  the  exact  object  for 
which  the  census  is  undertaken.  It  is  for  demographical  pur- 
it  b.  t  poses ;  to  supply  information  as  to  the  numbers 
and  local  distribution  of  the  population,  the 
numbers  of  each  sex  and  age,  their  so-called  civil  condition 
(i.e.,  whether  single,  married,  or  widowed),  and  their  nation¬ 
ality.  This  is  the  minimum  information  necessary  for  adminis¬ 
trative  purposes.  In  addition  to  these  facts  there  are  very 
many  others  which  the  statesman  and  the  economist  wish  to 
know  about  each  member  of  the  population,  and  the  census 
form  is  the  only  means  in  England  of  collecting  universal  data ; 
the  question  as  to  which  of  these  shall  be  investigated  and 
The  choice  of  which  neglected,  is  decided  more  by  expediency 
questions.  on  principie.  Of  these  desiderata  the  follow- 
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ing  may  be  mentioned  :  the  size  and  structure  of  the  family, 
its  position  in  the  social  scale,  the  economic  position  of  its 
head ;  the  nature  of  employment  of  its  members,  the  wage  or 
income  of  each  member  and  of  the  family  as  a  whole,  the  rent 
and  size  of  their  house,  their  educational  condition,  the  ages  at 
which  they  commenced  or  retired  from  work,  their  migrations, 
their  combination  in  religious  or  other  bodies,  and  their  infirmi¬ 
ties.  It  is  clear  that  some  of  this  information  must  be  dis¬ 
pensed  with,  if  the  form  is  not  to  be  overcrowded,  and  if  the 
tabulation  is  to  be  finished  in  any  reasonable  time;  and  an 
examination  of  the  general  nature  of  the  questions  which  can 
suitably  be  put  will  show  how  the  necessary  selection  is  made. 

First,  the  questions  must  be  those  which  the  informant  is 
able  to  answer.  Now,  if  the  questions  were  only  to  be  put 
to  educated  and  methodical  persons,  doubtless  a  Ability 
full  account  could  be  given  of  the  family  migra-  to  answer, 
tions  and  of  the  ages  at  which  each  member  had  been  at  work ; 
but  the  peculiarity  of  the  census  is  that  it  is  universal,  and 
the  questions  must  be  such  that  the  least  educated  and  most 
unthrifty  householder  shall  be  able  to  answer ;  in  many  cases 
such  facts  would  have  been  unrecorded  and  forgotten. 

Secondly,  the  questions  must  be  perfectly  definite,  so  that 
there  can  be  no  doubt  as  to  what  the  right  answer  should  be. 
The  only  answers  which  are  of  value  to  the 

.  .  .  ic  , ,  ,,  ,,  .  ,  ,  Definiteness. 

statistician  are  yes,  no,  or  a  simple  number, 
or  a  definite  place  or  date  or  the  use  of  a  word  that  has  a 
precise  meaning.  Adjectives  and  adverbs  such  as  many, 
often,  partly,  etc.,  bear  different  numerical  meanings  to 
different  people,  and,  though  they  may  express  fairly  clearly 
the  position  of  an  individual,  are  nearly  useless  for  tabulation,* 
which  is  their  only  purpose  so  far  as  the  census  is  concerned. 
Thus  the  question  as  to  education  would  have  to  be,  not 
“  state  whether  well,  moderately,  or  badly  educated,”  but 
“  state  at  what  age  school  was  left,”  or  “  how  many  years  at 
school?  ”  But  even  if  such  questions  were  not  excluded  by 
our  first  test,  by  the  forgetfulness  of  the  informant,  the  state¬ 
ments  given  would  be  of  little  practical  value,  and  very  often 
incorrect.  An  inquiry  as  to  wage  and  income  could  not  be 
made  sufficiently  definite  without  so  many  questions  as  to 
require  a  form  to  itself ;  for  wages,  as  we  shall  see  when  con- 


*  But  seep.  121,  infra. 
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sidering  the  Wage  Census,  require  very  careful  definition,  and 
many  subsidiary  questions  must  be  put  to  get  a  proper  estimate ; 
the  simple  query,  “  what  is  your  weekly  wage  or  annual  in¬ 
come?  ’’  would  be  answered  on  so  many  varying  principles 
that  the  result  would  be  valueless. 

Thirdly,  the  questions  must  be  such  as  will  be  answered 
truthfully  and  without  bias.  There  is  hardly  a  demand  on 
the  census  form  which  would  not  be  excluded,  if 
this  rule  was  too  rigorously  enforced,  as  we  shall 
see  immediately.  Perhaps  the  most  difficult  in  this  respect 
is  the  question,  Employer  or  employed  ?  For  though  there  are 
many  cases  in  which  a  man  is  both  employer  and  employed  so 
that  this  question  should  be  excluded  by  our  second  test,  many 
persons  consciously  exaggerate  their  social  importance  by 
erroneously  replying  the  former.  Questions  relating  to  social 
position  must  generally  be  excluded  by  this  rule. 

Fourthly,  the  questions  must  be  those  which  will  be  answered 
willingly,  and  must  therefore  not  be  inquisitorial,  or  such  as 
Reluctance  to  to  raise  apprehension  of  a  change  of  law  or  an 
answer.  imposition  of  taxes.  Questions  as  to  membership 
of  trade  unions,  or  of  friendly  societies,  or  as  to  insurance, 
would  be  thought  inquisitorial.  Many  would  refuse  to  state 
their  incomes,  holding  it  to  be  no  one’s  concern  but  their  own. 
Questions  as  to  rent  might  be  regarded  as  possibly  leading 
to  taxation.  Questions  as  to  religion  are  badly  answered,  as 
was  shown  in  the  evidence  before  the  Census  Committee  of 
1890,*  and  should  be  excluded  in  England  by  each  of  these 
four  rules.  Some  persons  do  not  know  what  their  religion 
should  be  named,  others  would  find  the  question  indefinite, 
others  would  deliberately  answer  wrongly,  and  many  not  at  all. 

The  questions  on  the  census  form  f  not  excluded  on  one 
or  other  of  these  grounds  are  Nos.  1,  2,  3,  4,  14  and  15 ;  these 
are  fairly  definite,  and  householders  are  generally  able  and 
willing  to  give  correct  answers  to  them.  Question  5  may  be 
inaccurately  answered  in  cases  of  divorce,  separation,  or 
irregular  unions.  Questions  6,  7,  8,  9  were  first  introduced  in 
1911,  and  though  there  were  many  inaccuracies,  the  answers 
have  given  important  new  information.  With  regard  to 
questions  10  and  n,  there  has  always  been  difficulty  in  dis- 


*  Report  of  Committee  on  the  Census ,  1890  (C. — 6071). 
f  Facing  this  page. 
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Name  and  Surname 

Relationship 
to  Head  of 
Family. 

Age 

(last  birthday) 
and  Sex. 

Particulars  as  to  Marriage. 

Profession  or  Occupation 
of  Persons  aged  ten  years  and  upwards. 

Birthplace 
of  every  person. 

Nation  alitv 
of  every  person 
born  in  a 
Foreign  Country. 

Infirmity. 

of  every  Person,  whether  Member 
of  Family,  Visitor,  Boarder,  or 
Servant,  who 

(1)  passed  the  night  of  Sunday, 

April  2nd,  1911,  in  this 
dwelling  and  was  alive  at 
midnight,  or 

(2)  arrived  in  this  dwelling  on 

the  morning  of  Monday, 
April  3rd,  not  having  been 
enumerated  elsewhere. 

No  one  else  must  be  included. 

State  whether 
“  Head,"  or 
“Wife,"  “Son," 
“  Daughter,”  or 
other  Relative, 
“Visitor," 

“  Boarder,"  or 
“  Servant." 

For  Infants 
under  one  year 
state  the  age 
in  months  as 
“under  one  month,’ 
"one  month,” 
etc. 

Write 

"Single," 

“  Married,” 

“  Widower,” 
or  “  Widow,” 
opposite  the 
names  of 
all  persons 
aged  15  years 
and  upwards. 

State,  for  each  Married  Woman 
entered  on  this  Schedule,  the 
number  of : — 

Personal  Occupation. 

Industry  or  Service 
with  which  worker  is 
connected. 

Whether  Employer, 
Worker,  or  Working 
on  Own  Account. 

Whether 
Working  at 
Home. 

(1)  If  born  in  the  United 

Kingdom,  write  the 
name  of  the  County, 
and  Town  or  Parish. 

(2)  If  born  in  any  other  part 

of  the  British  Empire, 
write  the  name  of  the 
Dependency,  Colony, 
etc.,  and  of  the  Pro¬ 
vince  or  State. 

(3)  If  born  in  a  Foreign 

Country,  write  the  name 
of  the  Country. 

(4)  If  bom  at  sea,  write 

“  At  Sea." 

Note. — In  the  case  of 
persons  born  elsewhere  than 
in  England  or  Wales,  state 
whether  “  Resident "  or 
“Visitor"  in  this  Country. 

State  whether : — 

(1)  “British  sub¬ 
ject  by  parent¬ 
age." 

(2)  “  Naturalised 
British  sub- 
ject,”  giving 
year  of  natu¬ 
ralisation. 

Or 

(3)  If  of  foreign 
national  ity, 
state  whether 
“French," 
‘‘German,’’ 

‘  ‘  Russian," 
etc. 

If  any  person 
included  in  this 
Schedule  is : — 

(1)  “  T  0  t  a  1 1  y 
Deaf,”  or  “  Deaf 
and  Dumb." 

(2)  “Totally 

Blind," 

(3)  "  Lunatic," 

(4)  “  Imbecile," 
or  “  Feeble¬ 
minded," 

state  the  infirmity 
opposite  that  per¬ 
son’s  name,  and 
the  age  at  which 
he  or  she  became 
afflicted. 

The  reply  should  show  the 
precise  branch  of  Profes¬ 
sion,  Trade,  Manufacture, 
etc. 

If  engaged  in  any  Trade  or 
Manufacture,  the  particular 
kind  of  work  done,  and  the 
Article  made  or  Material 
worked  or  dealt  in  should 
be  dearly  indicated. 

This  question  should 
generally  be  answered 
by  stating  the  business 
carried  on  by  the  em¬ 
ployer.  If  this  is 
clearly  shown  in  Col. 
xe  the  question  need 
not  be  answered  here. 

No  entry  needed  for 
Domestic  Servants  in 
private  employment. 

If  employed  by  a  public 
body  (Government, 
Municipal,  etc.)  state 
what  body. 

Write  opposite  the 
name  of  each  person 
engaged  in  any 
Trade  or  Industry, 
(1)  "Employer" 
(that  is  employing 
persons  other  than 
domestic  servants), 
or 

(2)  "Worker"  (that 
is  working  for  an 

employer),  or 

(3)  “Own  Account” 
(that  is  neither 

employing  others 
nor  working  for  a 
trade  employer). 

Write  the 
words 

“  At  Home  " 
opposite  the 
name  of  each 
person 
carrying  on 
Trade  or 
Industry  at 
home. 

Completed 
years  the 
present 
Marriage 
has  lasted. 
If  less  than 
one  year 
write 
"under 
one.” 

Children  born  alive  to 
present  Marriage. 

(if  no  children  born 
alive  write  "  None"  in 
Column  7). 

Total 

Children 

Born 

Alive. 

Children 

still 

Living. 

Children 

who 

have 

Died. 

Agea 

of 

Males. 

Agea 

of 

Females. 

'■ 

- 

3- 

4- 

5- 

6. 

7- 

8; : 

9- 

u. 

H 

*3- 

14. 

IS- 

16. 

■ 

2 

3 

• 

(To  be  filled  up  by,  or  on  behalf  of,  the  Head  of  t 

le  Family  or  other  person  in  occupation,  or  in  charge,  of  this  dwelling.) 

Write  below  the  Number  of  Rooms  in  this 
Dwelling  (House,  Tenement,  or  Apartment). 
Count  the  kitchen  as  a  room  but  do  not  count 
scullery,  landing,  lobby,  closet,  bathroom ;  nor 
warehouse,  office,  shop. 

I  declare  that  this  Schedule  i 

Signature 

Postal  Ac 

correctly  filled  up  to  the  best  of  my  knowledge  and  belief. 

To  face  page  32. 
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tinguishing  between  a  classification  according  to  the  nature  of 
the  work  a  person  is  doing  (e.g.,  as  a  clerk  or  a  carpenter) 
and  a  classification  according  to  industries  (e.g.,  where  the 
clerk  and  carpenter  are  employed  by  a  textile  firm) ;  in  1911 
questions  10  and  11  were  devised  with  a  view  to  making  a 
double  tabulation  possible.  Question  12  has  failed  to  give 
complete  information  as  to  the  status  of  a  worker,  and  ques¬ 
tion  13  is  inadequate  for  many  cases.  No  provision  is  made 
for  persons  who  follow  two  equally  important  occupations. 
Question  16  is  not  definite  and  leads  to  no  important  results. 
A  further  discussion  of  the  merits  of  some  of  these  is  to  be 
found  in  the  Report  of  the  Committee  already  mentioned ;  * 
here  it  is  only  intended  to  indicate  the  general  grounds  of 
inclusion  or  exclusion. 

So  far  we  have  not  discussed  the  important  question  as  to 
who  should  fill  in  the  form.  If,  as  in  the  English  Census, 
it  is  to  be  filled  in  by  the  householder,  the  ques-  Fining  uP  of 
tions  must  be  much  simpler  in  matter  and  words  the  form, 
than  if  it  is  to  be  filled  in  by  an  official  teller.  In  the  latter 
case  the  form  may  be  much  more  complicated,  the  questions 
more  inquisitorial  and  such  as  might  lead  to  indefinite  answers 
on  the  part  of  ignorant  people ;  for  the  teller  would  insist  on 
an  answer,  be  able  to  exclude  those  obviously  wrong,  and 
cross-question  till  the  indefinite  answers  were  so  altered  as  to 
allow  definite  tabulation.  In  a  great  and  complex  under¬ 
taking  like  the  Census,  where  many  tellers  must  be  impressed 
for  a  short  period,  their  instructions  and  the  general  plan  must 
be  sufficiently  simple;  but  as  the  extent  of  an  inquiry  con¬ 
tracts,  the  tellers  can  receive  more  complete  instructions,  and 
the  information  requisitioned  may  be  more  complex.  This  is 
of  most  importance  in  connection  with  columns  10-13. 

The  general  shape  and  appearance  of  the  sheet  need  atten¬ 
tion.  If  the  structure  of  the  family  is  to  be  shown,  the  answers 
are  best  given  on  a  single  sheet,  which  must  shape  of  blank 
contain  enough  lines  for  the  largest  ordinary  form- 
household,  so  that  the  trouble  of  fastening  together  of  many 
couples  may  be  avoided,  and  tabulation  not  be  hindered.  The 
spaces  must  contain  plenty  of  room  for  answers  in  uneducated 
handwriting,  without  making  the  whole  so  large  as  not  to  lie 


*  See  also  the  Statistical  Journal ,  1908,  p.  496,  and  1920,  p.  134. 
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easily  on  a  desk.  The  instructions  must  be  distinct  and  visible, 
and  placed  in  close  connection  with  the  answers;  to  further 
this,  a  skilful  use  may  be  made  of  capitals,  italics,  and  different 
founts  of  type.  On  the  form  facing  p.  22,  those  in  use  are 
roughly  reproduced  in  miniature. 

The  form  should  always  show  for  what  purpose  the  figures 
are  collected,  and  how  they  will  be  used,  in  order  to  enlist  the 
Purpose  to  be  support  of  the  informant  and  allay  misapprehen- 
shown.  sion.  The  extent  to  which  this  should  be  done 
depends  a  good  deal  on  whether  the  filling-up  is  compulsory, 
as  in  the  population  census,  or  voluntary,  as  in  the  wage 
census.  In  the  case  before  us  no  preamble  is  necessary,  since 
every  one  knows  the  main  features  of  a  census,  and  most  are 
willing  to  further  its  objects;  but  it  must  be  shown  that  the 
inquiry  is  sanctioned  by  Parliament,  and  that  compliance  is 
compulsory.  This  is  done  on  the  back,  on  the  fold  which  is 
outside  before  the  form  is  opened ;  and  even  though  penalties 
are  threatened  against  absence  of  or  falsification  of  returns,  the 
last  sentence  on  the  back  and  a  statement  on  the  front  of  the 
form  guarantees  the  informant  against  injurious  or  personal 
use  of  his  answers.  Where  information  is  voluntary,  a  careful 
letter  should  be  printed  and  circulated  with  the  form,  per¬ 
suading  the  informant  to  give  his  assistance. 

While  the  main  part  of  the  form  is  filled  in  by  the  house¬ 
holder,  other  parts  are  filled  in  by  the  officials,  and  with  very 
Subsidiary  little  trouble  a  good  deal  of  subsidiary  information 
information.  can  pe  couected  in  this  way.  .On  the  outside  the 

Registration  district  and  sub-district,  enumerator’s  district  and 
the  postal  address  are  written,  from  which  the  numbers  can 
be  tabulated  for  any  of  the  areas  required.  The  teller  could 
also,  as  he  took  the  form,  enter  the  number  of  stories  to  a 
house,  which  is  not  done  in  the  English  Census,  and  other 
nformation  as  to  the  style  of  house  and  street  might  be 
endorsed.  In  a  more  intensive  investigation,  expert  assistants 
could  be  trusted  to  come  out  of  a  house  with  an  accurate 
knowledge  of  many  interesting  details. 

We  can  now  proceed  to  the  individual  criticism  of  the  form 
in  the  light  of  the  rules  suggested  above.  In  the  first  place, 
even  the  arrangement  of  columns  is  not  perfect. 
To  labourers  who  are  not  in  the  habit  of  writing 
at  all,  and  who  have  (to  judge  from  election  posters)  to  be 


Lines  and 
columns. 
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instructed  how  to  put  their  mark  in  the  right  place  on  a  ballot 
paper  (many  papers  being  destroyed  simply  through  ignorance), 
this  arrangement  of  horizontal  and  vertical  columns  would  be 
confusing,  and  without  help  they  would  not  gather  at  all  what 
they  were  to  do.  They  would  fill  up  more  easily  a  paper  in 
which  the  answers  were  to  follow  the  questions  immediately  : — 

State  your  Name _ 

State  your  Age _ 

State  your  Sex. _ . 

Unmarried,  Married,  or  Widowed _ 

and  so  on. 

This  form,  however,  could  only  be  used  if  a  separate  paper  were 
to  be  filled  in  for  every  individual,  children  and  all,  as  is  the 
case  in  France. 

The  first  question,  which  for  the  general  purpose  of  the 
census  should  be  the  most  definite  of  all,  leaves  some  room  for 
doubt.  What  constitutes  “passing  the  night  ”  criticism  of  the 
in  the  case  of  a  night-watchman  returning  at  questions. 

4  A.M.,  or  of  a  printer  at  2  a.m.  ?  How  is  the  householder 
to  know  whether  any  of  his  establishment  are  returned  else¬ 
where  ?  Since  too  many  instructions  only  lead  to  confusion, 
the  tellers  should  be  specially  taught  the  answers  to  such 
questions. 

The  very  meaning  of  the  phrase  “  population  of  a  district  ” 
is  open  to  much  doubt.  In  France  “  la  population  de  fait,” 
which  consists  of  all  present  in  the  given  district  Meaning  of 
at  the  given  moment,  is  distinguished  from  “  la  population, 
population  de  droit,”  which  consists  of  all  usually  resident  in  the 
district,  including  those  temporarily  absent,  and  excluding  those 
only  momentarily  present,  and  from  “  la  population  munici- 
pale,”  which  is  “  la  population  de  droit,”  less  prisoners,  hospital 
patients,  scholars  resident  in  schools,  members  of  convents, 
the  army,  and  so  on.*  The  English  Census  has  counted  only 
“la  population  de  fait.”  In  the  United  States  in  1890  we 
find  a  “  constitutional  population,”  which  excludes  residents 
in  Indian  Reservations,  the  Territories,  and  the  District 
of  Columbia;  the  “general  population,”  which  includes  in 


*  See  Bertillon,  ibid.,  p.  146. 
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addition  the  Territories  (except  the  Indian  Reservation,  Indian 
Territory,  and  Alaska) ;  and  the  “  total  population/’  which 
includes  all  excluded  in  the  former.*  For  1910,  the  population 
as  generally  quoted  is  that  of  continental  U.S.A.,  viz.,  forty- 
eight  States  (including  Arizona  and  New  Mexico,  formerly 
territories)  and  the  District  of  Columbia.  We  find  also  a  total 
which  includes  Alaska,  Hawaii,  Porto  Rico,  and  the  army  and 
navy  abroad,  and  another  total  which  adds  to  these  the  esti¬ 
mated  population  of  the  Philippines,  Guam,  Samoa,  and  the 
Panama  Canal  zone.  For  the  apportionment  of  taxes  the 
population  of  the  District  of  Columbia  and  Indians  are  sub¬ 
tracted  from  the  continental  population.  Notice  that  the 
Channel  Islands  and  the  Isle  of  Man  are  included  in  the  English 
Census  enumeration,  but  not  in  the  total  generally  quoted. 
Also  an  account  of  soldiers  and  seamen  at  sea  or  abroad  is 
given  in  a  table,  but  they  are  not  included  in  the  total. 

It  is  possible  to  find  difficulties  in  filling  up  each  of  the 
columns,  owing  to  ignorance  or  ambiguity.  For  illustration, 
consider  how  column  2  should  be  filled  in  in  the  case  of  a  cousin 
who  was  a  “paying  guest,”  or  a  relation  who  was  a  visitor;  for 
column  5,  is  a  divorced  person  single  or  a  widower,  and  what  of 
a  woman  who  is  doubtful  whether  her  husband  is  lost  at  sea  ? 

It  is  well  known  that  columns  3  and  4  are  wrongly  filled 
in  for  two  reasons — one,  that  elderly  people  often  do  not 
know  their  ages  accurately  and  enter  them  to  the 
nearest  round  number,  so  that  the  returns  con¬ 
gregate  at  40,  50,  60  :  the  error  thus  arising  is  eliminated  by 
tabulation  in  the  groups  35-45,  45-55  years,  etc.,  and  for  more 
minute  tabulation  the  groups  3-7,  8-12,  13-17,  etc.,  are  sug¬ 
gested  :  the  other  is  that  many  women  habitually  enter  their 
ages  too  low;  in  this  case  also  the  Registrar-General  is  able 
to  deduce  nearly  correct  totals. 

It  is  to  be  noticed  that,  since  the  ages  stated  are  those 
“  last  birthday,”  the  age  will  on  the  average  be  given  six 
months  too  low,  and,  in  fact,  the  ages  given  as  17,  e.g.,  should 
be  scattered  nearly  uniformly  over  the  months  to  the  eighteenth 
year. 

The  most  important  criticisms  of  the  census-schedule  are  to 
be  made  on  columns  10-13.  It  will  not  be  expedient  here  to  go 

*  Willcox  :  Area  and  Population  of  the  United  States  at  the  XI.  Census, 
a  book  which  gives  a  very  useful  criticism  of  the  accuracy  of  the  most 
elementary  data  of  statistics. 
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into  all  the  questions  raised  before  the  Committee  on  the 
Census  as  regards  an  industrial  census.  While  there 

Occupation. 

can  be  little  doubt  that  a  thorough  census  of  occu¬ 
pations  would  be  best  undertaken  separately,  and  on  some¬ 
what  different  principles  from  the  population  census,  it  is 
certainly  better,  till  opinion  is  ripe  for  so  radical  a  change,  to 
include  in  the  present  census  the  best  questions  we  can  as  to 
occupations,  than  to  omit  them  altogether  in  despair  of  accurate 
results.  In  any  case,  a  census  of  occupations  ought  to  be 
co-ordinated  with  the  general  population  census,  otherwise 
great  difficulties  of  interpretation  arise.  Some  of  these  may 
be  seen  in  the  attempt  to  reconcile  the  statistics  of  the  number 
of  persons  employed  in  the  Report  of  the  Census  of  Production 
(Cd.  6320,  pp.  8-10)  with  the  statistics  of  persons  occupied 
according  to  the  Census  of  Population. 

The  objects  aimed  at,  which  we  must  always  keep  in  mind 
when  criticising  special  questions,  are  two  :  to  find  the  number 
employed  in  each  trade  and  industry,  that  is,  so  to  say,  to 
form  vertical  divisions ;  and  to  find  the  number  in  each  kind 
or  grade  of  employment  (labourer,  artisan,  employer,  etc.,  or 
smith’s  striker,  carpenter,  weaver,  etc.)  in  horizontal  divisions ; 
so  that  the  tabulation  may  give  some  such  result  as  : — 


Textile  Industries. 


Cotton. 

Wool. 

Linen. 

Totals 

Employers  - 

Managers 

Clerks 

Overlookers  - 

Spinners 

Weavers 

Labourers 

Children 

Totals 

1 
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The  necessary  minimum  of  information  would  be  given  by 
such  answers  as 

Legal — Solicitor — Managing  clerk. 

Mining — Coal — Hewer. 

Metal-worker — Iron — Smith’s  striker. 

Now  the  simple  instruction,  “  State  your  occupation,”  would  of 
course  not  lead  to  information  of  this  sort.  The  coal-hewer 
would  simply  say  miner;  the  clerk,  managing  clerk;  the 
striker,  very  likely  smith.  To  explain  what  is  wanted  and 
avoid  mistakes,  the  informant  is  referred  to  the  back  of  the 
form,  half  of  which  is  devoted  to  instructions  relating  to  these 
columns.  These  are  lucid,  carefully  picked  out  with  capitals 
and  italics,  comprehensive,  brief  and  to  the  point.  No  one 
who  wishes  to  fill  in  the  form  rightly,  and  is  sufficiently  educated 
to  understand  simple  instructions,  can  easily  go  wrong.  Yet 
it  is  probable  that  these  instructions  are  in  very  many  cases 
neither  read  nor  followed ;  and  this  is  very  important  in  con¬ 
nection  with  the  general  study  of  blank  forms  of  inquiry. 
Forms  issued  to  people  uninterested  in  the  object  in  view  will 
generally  be  filled  in  with  the  least  possible  expenditure  of 
time  and  intelligence.  Hence  two  courses  are  open  :  to  reduce 
the  question  to  the  simplest  possible  form,  and  make  the  best 
of  the  result ;  or  not  to  allow  the  informants  to  write  in  their 
own  answers,  but  to  take  them  viva  voce  by  means  of  a  teller, 
who  has  mastered  the  instructions,  and  has  the  necessary  legal 
force  behind  him  to  compel  information.  The  latter  course 
entails  time  and  expense. 

The  result  of  the  present  system  of  inquiry,  combined  with 
a  faulty  method  of  tabulation,  which  it  to  some  extent  makes 
necessary,  is  that  we  have  no  reliable  census  of  occupations  for 
the  United  Kingdom.  The  present  figures  break  down  both 
from  faulty  data  and  from  insufficient  tabulation  directly  we 
attempt  to  make  some  of  the  important  calculations  depending 
on  them. 

An  attempt  was  made  in  1891  to  correct  to  some  extent 
our  ignorance  of  the  relative  numbers  of  unskilled  and  skilled 
The  result  of  the  labourers,  employers  and  employed,  by  the  ques- 
new  questions.  tion  now  jn  coiumn  I2.  The  headings  were  not 

a  model  of  clearness ;  there  was  not  the  ordinary  imperative 
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“  state  ”  or  "  write/'  nor  was  one  told  on  the  front  of  the 
form  whether  to  write  Yes  or  No  or  to  make  a  mark  in  the 
appropriate  column,  nor  is  the  distinction  between  the  three 
headings  a  perfectly  definite  one;  but  still  one  was  hardly 
prepared  for  the  following  statement  in  the  report  :  * — 

“  In  numerous  instances,  no  cross  at  all  was  made ;  in  many 
others,  crosses  were  made  in  two  or  even  all  three  columns,  and, 
even  when  only  one  cross  was  made,  there  were  often  very 
strong  reasons  for  believing  that  it  has  been  made  in  the  wrong 
column.  Oftentimes  this  use  of  the  wrong  column  can  scarcely 
have  been  other  than  intentional ;  being  dictated  by  the  foolish 
but  very  common  desire  of  persons  to  magnify  the  importance 
of  their  occupational  condition.  This  desire  must  have  led 
many  subordinates  to  return  themselves  as  employers  rather 
than  as  employed,  for  it  is  only  on  this  supposition  that  we  can 
account  for  the  otherwise  unintelligible  fact  that,  under  several 
headings,  there  are  actually,  according  to  the  returns,  more 
employers  than  employed,  more  masters  than  men.  .  .  .  We 
hold  (these  returns)  to  be  excessively  untrustworthy,  and  shall 
make  no  use  whatsoever  of  them  in  our  remarks." 

The  questions  have,  however,  continued  to  be  inserted  and 
the  numbers  tabulated,  and  statisticians  have  used  the  results 
with  a  certain  confidence. 

This  attempt  and  its  results  are  of  the  greatest  importance 
to  all  who  try  to  draw  up  forms  of  inquiry. 

Before  leaving  the  subject,  it  should  be  mentioned  in  passing 
that  we  cannot  deduce  directly  from  our  census  the  number  of 
persons  dependent  on  a  particular  trade  for  their  living ;  that 
is  to  say,  the  number  of  employers,  their  families  (not  other¬ 
wise  returned)  and  domestic  servants,  and  the  number  of 
employes  and  their  dependent  families.  This,  the  most 
important  total  for  estimating  the  relative  importance  of 
different  trades  of  the  country,  is  not  tabulated,  though  such 
tabulation  has  been  found  possible  in  other  countries,  and 
we  are  dependent  on  the  estimates  of  statisticians  for  such 
totals,  f 

To  see  how  the  information  given  by  the  answers  on  the 
census  schedule  can  be  worked  up  into  detailed  specific 
numbers,  it  is  only  necessary  to  look  at  the  diagram  and 

*  General  Report  on  the  Census  of  1891,  p.  36  (C. — 7222  of  1893). 

|  See  Booth  in  Statistical  Journal ,  vol.  xlix. 
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table  prefixed  to  each  of  the  sections  relating  to  special  trades 
in  Mr.  Booth’s  Life  and  Labour  of  the  People  (e.g.,  vol.  v., 
p.  46).* 

Statisticians  have  generally  to  work  with  material  provided 
for  them ;  their  first  task  is  to  understand  exactly  the  defini¬ 
tions  under  which  the  data  were  obtained  and  the  limitations 
of  the  tables  published.  In  skilled  hands  quite  faulty  compila¬ 
tions  have  often  been  found  to  yield  accurate  results  of  great 
interest. 


Section  2. — The  Wage  Census. 

The  main  differences  in  method  between  the  wage  census, 
as  taken  in  1886  and  1906,  and  the  general  population  census 
are — (1)  That  the  filling  up  the  forms  in  the  wage  census  was 
voluntary;  (2)  that  their  correct  filling  up  required  a  higher 
degree  of  intelligence  and  education.  As  before,  we  must 
consider  first  the  object  which  the  wage  census  was  intended 
to  fulfil :  it  was  to  describe  the  earnings  of  the 
people  of  the  United  Kingdom,  to  compare  the 
rates  of  wages  trade  by  trade,  and  to  find  the  relative  numbers 
earning  at  each  rate.  What  is  the  best  quantity  to  measure 
with  this  object  in  view?  As  a  preliminary  question,  should 
The  unit  of  we  take  the  day,  week,  or  year  as  the  unit  of 
time.  time  ?  Clearly  we  shall  not  be  able  to  compute 
weekly  wages  if  we  only  obtain  daily,  for  the  week’s  work 
varies  from  four  to  seven  days  in  different  occupations.  The 
week’s  wage  is  a  more  definite  quantity;  but  the  simple 
comparison  of  weekly  wages  in  different  trades  will  be  decep¬ 
tive,  because  most  trades  are  busier  at  one  season  of  the  year 
than  at  another,  and  in  many  the  difference  between  season 
and  season  is  very  great ;  in  any  particular  week,  then,  we  may 
be  comparing  the  best  season  of  one  industry  with  the  worst 
of  another.  To  avoid  this  error,  and  because  we  do  not  know 
how  many  full  weeks’  wages  are  obtained  in  a  year,  except  in 
a  few  non-intermittent  trades,  it  would  seem  best  to  take  the 
year  as  unit ;  but  the  direct  calculation  of  an  individual’s 
annual  earnings  is  practically  impossible.  The  employer  is  not 
acquainted  with  this  sum,  for  in  large  establishments  the  hands 
are  continually  changing,  and  one  man  will  be  paid  by  two  or 


*  See  p.  57,  infra. 
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more  masters  in  the  same  year ;  and  even  in  a  factory  with  a 
nearly  constant  personnel,  the  weekly  amounts  paid  to  indi¬ 
viduals  are  not  in  general  so  tabulated  as  to  be  easily  summed, 
and  the  working  out  of  the  totals  would  require  a  prohibitive 
amount  of  clerical  labour.  If  we  turn  to  the  workman,  on  the 
other  hand,  we  shall  find  in  the  majority  of  cases  that  no 
accurate  account  has  been  kept  of  earnings  through  the  year, 
and  it  would  only  be  by  careful  individual  examination,  im¬ 
practicable  on  any  large  scale,  that  an  estimate  could  be  made ; 
in  many  cases  the  men,  even  if  willing,  would  be  quite  unable 
to  give  a  connected  account  of  their  earnings  during  the  past 
twelve  months. 

It  seems  clear  that  we  must  adopt  a  smaller  unit,  and  since 
most  wages  are  paid  weekly,  a  week  is  the  most  natural  one. 
The  subsidiary  questions  which  will  lead  best  to  an  estimate 
of  annual  earnings  will  be  discussed  below.  The  answer  to 
the  first  question,  as  to  the  best  quantity  to  investigate,  is 
indirect ;  the  only  individual  measurements  we  can  obtain 
directly  are  the  week’s  wages,  but  these  may  be  supplemented 
by  estimates  en  masse. 

Next,  who  possess  the  information  we  require  ?  Clearly 
both  employers  and  employed,  and  in  an  ideal  census  the 
answers  would  be  obtained  from  both  groups;  Employers  and 
but  considerations  of  simplicity,  cheapness,  and  employed  as 

.  ,  .  .  informants. 

accuracy  are  all  m  favour  of  applying  to  em¬ 
ployers  alone. 

If  employes  were  to  be  interrogated  the  procedure  would  be 
as  follows.  Draw  up  a  form  on  the  analogy  of  the  census  form, 
describe  very  briefly  the  purpose  of  inquiry,  add  a  short  series 
of  concise,  lucid,  simple  questions  in  suitable  type  and  with 
careful  spacing,  such  as  will  lead  to  the  minimum  information 
required ;  let  these  forms  be  left  to  be  called  for,  and  when 
collected,  let  the  tellers  have  time  and  opportunity  to  examine 
and  correct  them.  It  is  clear  that  this  method  would  entail  an 
even  more  expensive  organisation  than  the  population  census, 
and  as  the  result  of  experiment  it  may  be  doubted  whether  the 
maximum  of  accurate  information  that  could  be  thus  obtained 
would  come  up  to  the  minimum  that  would  be  of  use.  A 
partial  inquiry  can,  however,  be  carried  out  by  means  of  trade 
unions,  as  was  the  case  in  the  census  of  Railway  Wages  under¬ 
taken  by  the  A.S.R.S.  in  1908. 
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The  method  of  inquiry  among  employers  was  as  follows  : 
Suitable  blank  forms  and  an  explanatory  letter  were  sent  by 
post  to  all  employers,  whose  addresses  could  be  found,  in  the 
industries  selected  for  investigation,  and  the  answers  were 
returned  to  the  central  office  by  post.  This  is  far  simpler  and 
cheaper  than  the  suggested  scheme  for  inquiry  among  work¬ 
men,  requiring  far  fewer  forms  and  only  a  small  staff  of  clerks. 
With  business  men  it  is  a  simpler  matter  to  post  the  return 
when  completed  than  to  keep  it  for  collection  by  hand.  Since 
there  is  no  personal  intercourse  over  the  matter  it  is  especially 
necessary  that  the  questions  should  be  lucid,  for  the  additional 
correspondence  necessary  to  rectify  errors  is  a  source  of  worry 
at  both  ends.  A  copy  of  one  of  these  forms  used  in  1886, 
abridged  only  in  the  number  of  subdivisions,  is  subjoined  here 
and  on  the  following  page. 

WAGE  CENSUS. 

Return  of  the  Rates  of  Wages  Paid  in  Silk  Manufactures. 

Name  of  Factory  or  Firm _ 

A  d dress _ 


Note. — It  is  requested  that  the  salaries  of  clerks  and  managers  may  be  excluded. 
The  return  is  of  wages  of  working  men  only. 


Numbers  employed  on - 1886  -  -  No.  _ 

Amount  paid  in  Wages  in  the  year  1885  -  -  jQ — 

Highest  weekly  amount  paid  in  1885  jQ - Date 

Number  of  Hands  paid  in  that  week  -  -  No._ 

Lowest  weekly  amount  paid  in  1885  £ - Date 

Number  of  Hands  paid  in  that  week  -  -  No._ 


State  the  present  average  rate  of  pay  for  overtime :  that  is,  whether 
overtime  is  reckoned  as  time  and  a  quarter  or  time  and 
a  half,  &c.,  or  in  what  way  reckoned _ 

State  whether  overtime  is  at  present  being  worked,  and  how  much ; 
or  whether  less  than  full  time,  and  how  much  less _ 
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Current  Rates  of  Wages  and  Hours  of  Labour  per 
Week  of  Persons  employed  in  each  Branch  of  the  Silk 
Manufactures,  on _ 1886. 


Current  Rates  of  Wages  Paid  and  Number  of 
Hours  of  Labour  per  Week  when  in  full  work, 
but  exclusive  of  Overtime. 


Description  of 
Occupation. 

N.B. — It  is  requested 
that  this  list  of  occu¬ 
pations  may  be  re¬ 
vised  where  necessary. 


Silk  Throwing — 

r 

Parters  - 
Winders  -  -j 
Cleaners 
Spinners 


Doublers 


■{ 

•{ 

•{ 


Dressers 


&c. 

Silk  Spinning — • 
Openers  and  / 
Sorters  -  \ 

Boilers  - 

■{ 

Preparers  and  / 
Carders  -  \ 
&c. 

Silk  Weaving — 
Winders  - 

Warpers  - 

Warp  Pickers/ 
or  Clearers  \ 


Doublers 

Fillers 


{ 

•{ 


Time 

Piece 

Time 

Piece 

Time 

Piece 

Time 

Piece 

Time 

Tiece 


Time 

Piece 

Time 

Piece 

Time 

Piece 

Time 

Piece 


Time 

Tiece 

Time 

Piece 

Time 

Piece 

Time 

Piece 

Time 

Piece 


Note. — State  the  Number  of  Hours  of  Labour  per  Week, 
whether  the  Workers  were  paid  by  Time  or  Piece¬ 
work,  and  if  paid  by  Piece-work  give  the  amount 
earned  in  a  week,  exclusive  of  Overtime. 


MALES. 

FEMALES. 

Men. 

Lads  &  Boys. 

Women. 

18  years  and 
upwards. 

Girls. 

Under  18  years. 

Number 

Employed. 

Rates  of 
Wages. 

Hours  of 
Labour. 

Number 

Employed. 

Rates  of 
Wages. 

Hours  of 
Labour. 

Number 

Employed. 

Rates  of 
Wages. 

Hours  of 
Labour. 

Number 

Employed. 

Rates  of 
Wages. 

Hours  of 
Labour. 

D* 


34 


ELEMENTS  OF  STATISTICS 


The  measurement  of  the  annual  earnings  of  groups  of 
workpeople  was  one  of  the  ultimate  objects  of  the  inquiry. 

Annual  earnings  are  composed  of  many  different 
Annual  earnings.  -£ems^  wpicp  the  following  are  the  most  impor¬ 
tant  :  Ordinary  weekly  wages,  pay  for  overtime,  special  pay¬ 
ment  for  special  work  (e.g.y  of  builders  if  sent  to  a  distance), 
or  at  special  seasons  (such  as  the  harvest) ;  and  payments  not 
in  cash,  such  as  free  or  reduced  house-rent,  free  or  cheap  coal, 
and  special  goods  at  cheap  or  wholesale  prices  (such  as  cloth 
in  textile  factories,  or  potatoes  for  agricultural  labourers). 

When  payment  in  kind  is  at  all  general  or  important,  it  is 
generally  better  to  proceed  on  a  different  method  entirely,  e.g.y 
that  followed  by  the  Agricultural  Sub-Commissioners  of  the 
Labour  Commission.  When  it  consists  of  only  one  simple 
item,  such  as  a  house  rent-free,  it  can  form  the  subject  of  an 
additional  question  on  a  form  similar  to  that  on  p.  32.  In  the 
silk  industry  this  does  not  occur ;  but  this  discussion  shows  the 
necessity  of  preliminary  knowledge  on  the  part  of  the  investi¬ 
gator  before  the  right  form  of  inquiry  can  be  drawn  up. 

We  have  left  for  consideration  the  weekly  wage,  and  over¬ 
time  and  special  payments,  the  last  two  of  which  can  be  grouped 
together.  The  ordinary  weekly  wage  is  a  sufficiently  general 
and  definable  quantity  in  most  subdivisions  of  most  industries. 
A  foreman  could  generally  state  how  much  is  earned  in  an 
ordinary  full  week  for  each  of  the  hands  under  him.  In  many 
cases  there  is  an  hourly  or  weekly  sum  regulated  by  a  trade 
union,  as  in  the  building  trades.  In  others,  as  in  the  cotton 
industry,  piece-rates  are  so  regulated  as  to  bring  out  a  definite 
sum  for  the  week’s  work  graduated  in  relation  to  the  difficulty 
of  the  task ;  in  general,  a  very  rapid  survey  of  the  wage-book 
will  show  what  the  worker  in  each  subdivision  will  make  on  an 
average.  Thus  the  average  weekly  wage  in  an  ordinary  full 
week  can  be  found  with  considerable  accuracy,  but  this  takes 
us  only  part  of  the  way  in  the  calculation  of  annual  earnings ; 
we  need  to  know  in  addition  to  this  how  many  full  weeks  are 
made  in  the  year.  It  is  the  method  by  which  this  is  attempted 
on  the  printed  form  that  is  open  to  most  criticism.  The  ques¬ 
tions  used  are  on  p.  32,  and  afford  a  good  example  of  the 
general  difference  between  the  quaesita  and  the  data  which  are 
attainable.  The  quaesitum  is  :  To  how  many  full  weeks’  wage 
are  the  annual  earnings  equivalent  allowing  for  slack  weeks 
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and  overtime  ?  The  first  crucial  question  to  decide  is  :  Are  we 
to  allow  for  an  average  loss  of  time,  say  two  weeks  in  the  year, 
through  sickness,  or  arc  we  to  allow  only  for  time  Qu^sita  and 
lost  through  failure  of  work?  Since  sickness  is  data- 
an  individual  not  a  general  misfortune,  it  will  be  better  to 
exclude  it  if  possible.  Now  overtime  in  one  season,  especially 
if  its  wages  are  on  “  time-and-a-quarter  ”  or  “  time-and-a- 
half  ”  basis,  very  quickly  tends  to  balance  slack  time  at  another 
season,  though  it  may  be  supposed  that  it  is  rarely  the  case 
that  more  than  the  normal  week’s  wage  is  averaged  through 
the  year.  Thus  it  will  be  logical  as  well  as  simple  to  estimate 
the  year’s  earnings  as  so  many  normal  weeks'  wages.  For 
example,  if  we  found  that  two  weeks  were  lost  through  sick¬ 
ness  and  three  through  the  mill  stopping,  and  that  overtime 
in  one  busy  month  had  added  wages  equivalent  to  two  normal 
weeks,  we  should  have  forty-nine  weeks’  full  wage.  The 
figures  which  will  give  this  result  will  be  the  total  sum  paid  in 
wages  in  the  factory  in  the  year  divided  by  the  aggregate 
normal  week’s  wage  of  the  people  dependent  on  the  factory, 
supposed  all  at  work.  Thus,  if  1200  hands  (men,  women,  and 
children)  would,  if  all  at  work,  make  £1000  in  a  normal  week, 
and  this  was  the  average  number  dependent  on  the  particular 
mill,  and  if  £48,000  was  paid  in  the  year  in  wages,  annual  earn¬ 
ings  would  be  equivalent  to  forty-eight  normal  weeks,  and 
earnings  would  average  £40.  Now  the  total  paid  in  wages  is 
generally  kept  separate  in  business  accounts,  but  the  number 
dependent  on  the  mill  for  work  is  often  not  known  accurately ; 
for  the  personnel  of  a  large  establishment  is  subject  to  continual 
change,  and  the  manager  would  not  know  whether  a  person  who 
left  went  to  another  mill  or  got  no  work.  The  total  number 
of  all  who  had  worked  there  during  the  year  would  be  too  great 
for  this  purpose,  and  the  number  at  work  in  a  normal  week 
too  small.  The  number  open,  perhaps,  to  least  objection  is 
the  number  at  work  in  the  busiest  week  of  the  year ;  for  those 
absent  except  through  sickness  when  trade  is  busy  cannot  be 
said  to  be  dependent  on  the  factory,  but  if  not  at  work  else¬ 
where  are  among  the  permanent  unemployed ;  very  few  work¬ 
people  indeed  will  be  taking  their  holiday  at  a  busy  time,  and 
it  may  reasonably  be  supposed  that  all  the  factories  in  the 
same  industry  will  have  their  busy  and  slack  seasons  at  nearly 
the  same  time.  The  answers  then  to  the  printed  questions — 
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Total  paid  in  year,  and  number  of  hands  in  busiest  week — tell 
us  all  we  need  to  know,  if  we  may  make  this  assumption ;  for 
then  the  total  sum  paid  as  wages  in  the  year,  divided  by  the 
maximum  number  employed  in  the  busiest  week,  gives  the 
average  annual  earnings.  To  find  the  equivalent  number  of 
normal  weeks,  multiply  the  maximum  number  employed  by 
the  average  wage  found  on  the  second  page  of  the  form,  so  that 
the  product  shows  the  aggregate  weekly  wage  if  all  were 
employed,  and  divide  the  total  paid  in  the  year  by  this  product. 

The  process  may  be  illustrated  by  comparing  the  data 
obtained  in  the  more  recent  census  with  that  in  1906,  when 
more  information  was  obtained. 

In  1906  some  of  the  particulars  obtained  were  as  follows. 
The  Cotton  Industry  of  the  United  Kingdom  is  taken  as  an 
A  ,  .  example  and  the  figures  relate  only  to  those 

Annual  earnings  x  u  ^ 

and  weekly  firms  which  made  returns.  [See  Cd.  4545,  pp. 
wages'  xxv-xxxvii,  3,  1 7,  20-28,  and  (for  the  blank 
schedule  issued)  242-4.] 


T  =  Total  wages  in  1906  =  £10,195,229. 

W  —  Average  of  12  weekly  statements  *  of  aggregate  wages 
=  £204,173. 

N  =  Average  of  12  weekly  statements  of  aggregate  numbers 
=  212,503. 

M  =  Greatest  aggregate  recorded  among  N  =  213,472. 

Ae  =  Average  earnings  of  all  employed  in  particular  week 

=  *9-43*- 

A /  =  Average  earnings  of  those  employed  in  particular  week 
who  worked  neither  overtime  nor  broken  time  = 
19s.  yd. 


Hence  we  have 

A  =-j^-  =  average  earnings  in  the  12  selected  weeks  =  19-21 
T 

Ea  =  N =  avera£e  annual  earnings  of  the  average  numbc 


=  £47'98- 

£ 

ni  —  -?  —  number  of  weeks’  average  earnings  obtained  in 
A 

the  year  =  £49-95. 

The  difference  between  52  and  nlf  i.e.,  2-05  weeks,  is  attribut- 


*  The  last  ordinary  week  in  each  month. 
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able  to  holidays,  .which  range  from  8  to  15  working  days,  but 
includes  also  stoppages  of  the  factories  from  any  cause. 

T 

Em  =  —  =  £4776  =  average  annual  earnings  of  the  maxi¬ 
mum  number,  which  is  taken  as  the  number  dependent  on  the 
factories,  the  variation  being  due  to  unemployment. 

n2  =  =  4973  =  number  of  weeks’  average  earnings  of 

this  maximum. 

nx  —  n2  =  0-2i  =  possible  estimate  of  weeks  lost  by  un¬ 
employment  in  1906. 

To  this  should  be  added  an  estimate  for  the  number  unem¬ 
ployed  in  the  maximum  week. 

A /  q  T  .  i  •  bhokevi ftW  ,  , 

X"  =  i-ooo.  In  this  case  overtime  exceeds  broken  time  on 

the  whole,  so  that  earnings  are  0-8  pel  cent.  Above  those 
obtained  simply  by  full-time  work. 

Applying  this  percentage  to  n2,  we  obtain  50-1  as  the  number 
of  weeks  in  the  year  in  which  full-time  earnings  could  be 
obtained  by  the  maximum  number  recorded  as  employed. 

In  1886  the  corresponding  totals  recorded  were  T  = 
£3,148,566  for  the  year  1885.  A/,  ordinary  wages  in  a  normal 

week  in  1886  =  I5’2s.  M,  the  greatest  number  recorded  in 

T 

1885  =  87,887.  Hence  Em  =  —  =  £3 5*8,  average  annual  earn- 

M 

ings  of  maximum  number;  and  if  we  can  take  A /  as  the 
same  as  A  (the  average  weekly  earnings  in  the  year),  n2  — 
E m  4-  A/  =  47- 1,  the  number  of  weeks’  earnings  of  this 
maximum.  Here  we  cannot  compare  A/  and  Ae'for  want  of 
data. 

The  method  is  evidently  open  to  criticism  from  several 
points  of  view,  and  is  here  given  rather  to  illustrate  the  nature 
of  the  problem  and  of  the  data  which  may  help  to  solve  it, 
than  as  a  complete  statement  of  the  relation  of  normal  wages 
to  annual  earnings. 

In  addition  to  lost  time  due  to  holidays  and  to  complete 
unemployment  in  the  maximum  week,  there  is  lost  time  due 
to  sickness,  of  which  an  estimate  *  is  an  average  of  1-7  weeks. 

In  the  corresponding  French  wage  census,  of  which  the 

*  See  Division  of  the  Product  of  Industry,  1919,  by  the  present  author, 
p.  30,  and  Dr.  Snow  in  the  Statistical  Journal,  1912-13,  p.  477. 
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results  were  published  in  1898,*  an  estimate  of  the  number  of 
days’  work  obtained  in  the  year  is  formed  on  a  different  basis. 
The  data  collected  were — (1)  The  variation  each  month  of  the 
personnel  in  each  industry,  which  is  found  to  average  4  per  cent. 

The  French  for  the  year — that  is,  for  each  100  employed,  96 

method.  are  founci  wh0  have  been  in  the  same  establish¬ 
ment  for  as  much  as  twelve  months  :  (2)  The  differences 
between  the  maximum  and  minimum  numbers  employed  in 
each  establishment  month  by  month  during  the  course  of  a 
year,  which  are  found  to  average  19  per  cent,  of  the  ( ?  average) 
personnel.  From  this  we  may  perhaps  draw  the  conclusion 
that,  on  an  average,  half  this  number,  at  least,  are  in  general 
out  of  work  :  (3)  The  number  of  different  persons  who  have 
been  employed  in  each  establishment  at  one  time  or  other  in 
the  year;  this  is  found  to  be  140  for  each  100  permanently 
employed,  from  which  the  legitimate  conclusion  is  that  the 
average  number  of  unemployed  is  not  so  much  as  40  in  140, 
i.e.,  28  per  cent.  These  two  percentages,  9  per  cent,  and  28 
per  cent.,  are  taken  to  be  the  inferior  and  superior  limits  of 
average  lack  of  work.  This  information  is  more  detailed  and 
perhaps  more  reliable  than  that  on  which  the  method,  used 
above  for  the  English  figures,  is  based.  Data  obtained  from 
syndicates  of  French  workmen  indicate  about  20  per  cent,  as 
the  average  want  of  work;  the  English  figures  obtained  by 
the  method  described  above  from  the  whole  wage  census  yield 
about  12  per  cent,  in  1886. 

This  somewhat  lengthy  discussion  on  the  few  questions 
included  on  the  first  page  of  the  form  is  a  good  illustration  of 
the  necessity  of  considerable  preliminary  study  before  a  blank 
form  can  properly  be  drawn  up.  Space  does  not  allow  a 
detailed  criticism  of  the  rest  of  the  form,  but  it  should  be 
mentioned  that  the  questions  relating  to  individual  wages  in 
1886  were  not  sufficiently  detailed.  Thus  under  “  Spinners, 
piece  ”  (see  schedule,  p.  33)  in  each  factory  the  earnings  given 
would  be  an  average  for  all  employed,  so  that  the  earnings  of 
individuals  were  not  recorded,  and  the  general  distribution 
of  earnings  could  only  be  given  approximately.  In  1906  the 
instruction  was  “  Those  earning  the  same  amount  may  be 
grouped  together ;  otherwise  each  entry  should  represent  only 


*  Salaires  et  Duree  dn  Travail ,  1897,  PP-  I5>  16. 
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one  person/’  and  the  actual  variation  in  each  occupation  and 
industry  could  be  shown. 

A  careful  comparison  of  the  two  schedules  is  recommended, 
for  it  will  throw  light  on  many  of  the  difficulties  experienced 
in  preparing  questionnaires. 

Section  3. — Example  of  an  Unofficial  Investigation. 

Investigations  without  official  authority  do  not  differ 
essentially  from  those  conducted  by  authority  if  (as  in  the 
Wage  Census)  there  is  no  compulsion  to  answer ;  but  they  are 
generally  more  limited  in  their  scope,  for  want  of  organisation 
or  funds,  and  are  at  the  same  time  freer  to  employ  the  method 
of  samples  (which  is  discussed  below,  Part  II,  Chapter  II.)  and 
induced  to  do  so  in  order  to  cover  an  adequate  field. 

As  an  example,  we  may  take  the  investigations  relating  to 
the  economic  condition  of  the  working-classes  in  certain  towns, 
whose  results  are  published  in  Livelihood  and  working-class 
Poverty  *  The  problem  to  be  considered  was  not  conditions, 
precisely  defined  beforehand ;  in  brief,  the  intention  was  to 
obtain  what  information  it  should  be  found  practicable  to  get, 
as  to  the  number  of  earners  and  dependents  in  working-class 
families,  their  earnings  and  their  needs,  and  to  tabulate  those 
parts  of  it  which  after  criticism  were  believed  to  rest  on  trust¬ 
worthy  answers. 

It  is  generally  the  case  in  such  investigations  that  it  is 
necessary  to  obtain  the  information  personally,  since  people 
are  not  willing  to  fill  in  and  return  questionnaires  unless  there 
is  some  strong  inducement  (e.g.,  obtaining  sugar)  to  do  so. 
Consequently  the  forms  used  need  contain  few  instructions, 
the  investigators  being  specially  selected  and  prepared  for  the 
work.  It  was  found  advantageous  to  use  cards  rather  than 
paper  schedules,  and  a  facsimile  is  given  on  p.  40. 

It  had,  as  always,  to  be  considered  what  facts  were  actually 
known  by  the  householder  or  his  wife,  and  what  were  likely 
to  be  communicated  to  a  tactful  and  persistent 

.  .  ^  ,  .  -  ..  .  A  The  blank  card. 

inquirer.  Once  the  wife  is  engaged  m  conversa¬ 
tion,  there  is  no  difficulty  in  eliciting  information  as  to  the 
inhabitants  of  the  household,  the  age  of  those  under  twenty, 
and  the  occupations  (and  generally  the  employers)  of  those 


*  Published  for  the  Ratan  Tata  Foundation.  G.  Bell  &  Sons,  1915. 
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at  work ;  the  rent  and  type  of  house  can  also  be  easily  recorded 
(and  checked  if  necessary).  This  information  by  itself  led  to 
valuable  tables  showing  the  constitution  and  earning  strength 
of  the  families,  and  the  enormous  variation  and  absence  of 
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any  standard  type  in  these,  of  a  kind  that  have  not  been 
compiled  in  the  census  or  any  other  official  investigation. 

The  difficulties  were  found  in  assessing  family  income. 
The  wife  often  does  not  know  the  husband’s  or  the  elder 
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children’s  earnings,  and  information  is  not  readily  given  in 
very  many  cases.  Where  it  was  given,  it  could  be  verified  in 
selected  cases  by  inquiry  from  the  employers,  and  where  tests 
were  made  it  was  found  that  there  was  no  bias  in  the  direction 
of  either  overstatement  or  understatement.  If,  as  was  generally 
the  case,  the  occupation  was  correctly  stated,  it  was  possible 
to  estimate  with  fair  accuracy  the  normal  week’s  wage  from 
the  known  standard  in  the  town.  The  distinction  on  the 
card  between  “  last  week’s  ”  and  “  full  time  ”  earnings  was 
made  because  the  first  was  capable  of  a  definite  answer,  and 
the  second  was  often  an  estimate.  The  investigator,  having 
both  statements,  would  be  able  to  find  the  reason  for  any 
difference,  and  to  establish  the  second  (which  was  the  only 
one  used  in  tabulation)  more  definitely  than  if  it  stood  alone. 
It  was  found  necessary  to  tabulate  only  the  conditions  which 
would  exist  if  a  full  week’s  work  were  done,  and  to  leave 
aside  questions  of  sickness  and  unemployment.  The  answers 
to  the  question  as  to  “  other  sources  of  income  ”  were  certain 
to  be  imperfect ;  but,  so  far  as  they  went,  they  showed  the 
means  of  livelihood  of  some  families  whose  wages  were  evidently 
insufficient,  and  since  they  erred  only  by  omission  they  gave 
some  positive  information.  The  majority  of  working-class 
families  have  only  a  negligible  amount  of  income-yielding 
property,  and  the  main  source  of  such  income,  the  ownership 
of  the  house  inhabited  or  of  other  houses,  was  generally  reported. 

The  estimates  of  earnings  were  not  believed  to  be  sufficiently 
accurate  to  lead  to  a  table  showing  the  numbers  with  various 
annual  incomes,  but  they  were  adequate  for  the  main 
purpose  for  wmch  they  were  used,  this  purpose  minimum 
was  to  find  out  what  proportion  of  the  families  standard  of 

*■  x  living. 

had  an  income  (apart  from  charity)  to  bring 
them  above  a  certain  standard,  such  as  Mr.  Rowntree’s 
minimum  standard  as  calculated  by  him  in  Poverty.  In  the 
great  majority  of  cases  there  was  no  doubt  from  the  constitu¬ 
tion  by  age  and  sex  of  and  the  number  of  dependents  in  the 
family  and  the  nature  of  the  man’s  work,  on  which  side  of  the 
line  the  household  stood.  In  the  doubtful  cases  (which  were 
kept  apart  in  the  tables)  advantage  was  taken  of  all  the  points 
noted  by  the  investigator  (including  non-numerical  statements 
written  on  the  back  of  the  card  which  was  reserved  for  this 
purpose),  and  a  reasonable  judgment  could  generally  be  made. 
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The  card  was  not  shown  to  the  informant,  but  was  filled 
in  immediately  after  the  conversation.  The  identity  of  the 
household  was  only  preserved  by  the  inquiry  number.  A 
file  number  was  written  in  to  preserve  an  order  after  the 
first  process  of  sorting.  Each  card  was  criticised,  and  the 
numbers  needed  for  tabulation  were  computed,  and  these  and 
abbreviations  showing  the  constitution  of  the  family  (such 
as  m.,  s.  :  w.,  sc.,  sc.,  in.,  where  a  man  and  his  son  were  earning 
and  his  wife,  two  school-children,  and  an  infant  were  dependent) 
were  written  in  the  small  spaces  under  the  words  “  File  No.” 
The  entries  for  the  tables  were  then  obtained  by  dealing  the 
cards  into  appropriate  packs  and  then  counting  them;  this 
process  is  rapid,  but  needs  continual  careful  verification. 

The  scope  of  each  inquiry  (e.g.,  the  working-class  of 
Northampton)  needed  careful  definition.  It  had  to  be  decided 
whether  the  town  from  an  economic  point  of  view 
coincided  with  the  administrative  borough,  and, 
if  not,  outlying  houses  must  have  been  definitely  included  or 
excluded.  Next  it  was  necessary  to  get  an  accurate  list  of 
all  the  houses  in  the  district  and  apply  the  method  of  selection 
by  sample  to  this  list ;  the  inquiry  actually  dealt  with  whatever 
was  contained  in  the  list  used,  and  the  list  gives  the  definition 
of  its  scope.  There  is  no  accepted  definition  of  the  “  working- 
class,”  and  that  actually  used  was  in  fact  determined  during 
the  handling  of  the  cards.  As  a  preliminary,  all  the  houses 
at  first  selected  which  were  above  a  certain  rental  or  whose 
tenants  were  contained  in  a  directory  of  principal  residents 
were  excluded.  Of  those  visited  all  were  excluded  in  which 
the  principal  earner  was  a  clerk,  teacher,  or  manager.  For 
others,  such  as  shop  assistants,  commission  agents,  publicans, 
small  shopkeepers,  decisions  had  to  be  made  and  recorded  as 
the  various  cases  arose.  The  final  definition  of  the  working- 
class  households,  as  understood  in  the  inquiry,  was  then  by 
delimitation,  and  if  given  in  full  would  be  somewhat  as  follows  : 
all  households  where  the  rent  was  less  than  12s.  weekly,  in  which 
the  principal  earner  was  not  a  clerk,  teacher,  etc.  etc.  Such  a 
process  of  forming  the  definition  during  tabulation  is  of  necessity 
quite  common;  the  decisions  should  be  quite  clearly  shown 
in  the  report,  and  emphasis  should  there  be  laid  on  the  treat¬ 
ment  of  marginal  cases. 
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The  informants. 


Section  4. — Statistics  of  England’s  Foreign  Trade. 

The  original  schedules  which  lead  to  many  other  statistics 
are  interesting,  but  limits  of  space  must  restrict  us  to  one 
more  typical  inquiry,  that  which  leads  to  our  statistics  of 
foreign  trade. 

In  the  population  census  the  filling  in  of  the  form  is  com¬ 
pulsory  and  done  by  the  householder;  in  the  wage  census 
the  answers  were  voluntary  and  given  once  and  for  all  by  the 
employer ;  in  the  various  inquiries  undertaken  by  the  Labour 
Department  the  answers  are  voluntary,  but  in  many  cases 
periodic,  so  as  to  become  quasi-official.  The  method  of  collec¬ 
tion  of  import  and  export  statistics  is  a  blend  of  all  these. 
There  are  three  classes  of  persons  who  know  the 
facts  in  question — the  sender  of  the  goods,  the 
custom-house  official  through  whose  hands  they  pass,  and  the 
recipient  or  his  agent.  Circumstances  decide  that,  in  the  case 
of  exports  from  the  United  Kingdom,  the  exporter  or  his  agent 
sends  an  account  of  the  quantity  and  value  and  place  of  destina¬ 
tion,  etc.,  of  goods  despatched  to  the  Statistical  Department 
of  Customs;  that,  in  the  case  of  imports,  the  receiving-agent 
hands  over  an  account  of  goods  to  be  landed  to  the  custom¬ 
house  officials,  who  verify  the  account,  roughly  if  the  goods 
are  duty  free,  carefully  if  they  are  liable  to  duty;  and  that, 
in  the  case  of  transhipment,  the  goods  are  treated  in  the  same 
way  as  imports  at  the  port  of  landing,  and  to  some  extent 
verified  at  the  port  of  embarkation. 

The  blank  forms,  being  verified  by  officials  as  part  of  their 
duty,  or  having  been  filled  in  by  agents  thoroughly  used  to  the 
task,  need  no  covering  letter,  and  may  be  made  as  complicated 
as  necessary ;  no  questions  are  inserted  but  only  blank  tables. 
An  examination  of  the  forms  in  use  will  show  what  are  included 
as  exports  and  imports  in  the  Board  of  Trade  totals,  and  what 
is  the  total  amount  of  information  available  for  tabulation.* 

The  quantities  we  wish  to  measure  in  this  investigation  are  : 
the  volume  or  weight  and  value  of  all  goods  which  have  an 
exchange  value,  which  leave  our  shores  or  reach  The  quanta 
them  from  without,  subdivided  as  regards  classes  and  data, 
of  commodities  and  countries  of  destination  or  origin;  the 


*  The  following  paragraphs  do  not  take  cognisance  of  any  changes  that 
may  have  taken  place  since  1914. 
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values  being  those  at  the  times  of  loading  or  unloading.  The 
quantities  we  can  measure  are  sharply  distinct  from  these, 
being  the  records  of  values  and  volumes  which  reach  the 
Board  of  Trade.  We  should  therefore  examine  the  forms  to 
decide — (i)  What  part  of  imports  and  exports  are  recorded; 
(2)  whether  the  values  are  correctly  given,  (3)  the  quantities 
accurately  registered,  (4)  the  commodities  accurately  defined, 
(5)  the  countries  of  origin  and  destination  accurately  dis¬ 
tinguished  in  the  returns. 

On  reaching  port  the  ship’s  master  has  to  send  in  an 
Examples  of  account,  of  which  an  abridged  specimen  is  given 

information.  pv-n  a  a  • 


The  goods  for  quick  transit  are  passed  at  once,  and  a  special 
form  is  sent  to  the  Customs  Establishment  similar  in  character 

d  ti  bi  oods  ^at  on  P*  4^ •  The  remaining  goods  are  treated 
either  as  dutiable  or  as  duty-free  articles.  In  the 
list  before  us,  ten  cases  of  wine  are  entered  for  home  use,  and 
an  account  is  sent  into  the  Statistical  Office;  sixty  cases  are 
warehoused  and  another  account  (as  to  quality,  quantity,  and 
value)  is  sent  in ;  the  whole  are  registered  as  imports.  Twenty 
of  the  warehoused  cases  are  removed  to  another  port  and 
re-exported ;  an  account  is  sent,  and  they  are  entered  as  exports 
of  foreign  goods.  Twenty  are  put  on  board  ship  as  stores  at 
the  port  of  entry,  and  ten  more  removed  to  another  port  for 
the  same  purpose,  and  of  this  the  central  office  receives  an 
account ;  the  remainder  are  removed  to  another  warehouse, 
still  in  bond,  and  on  leaving  that  will  be  treated  in  one  of  the 
four  ways  just  mentioned.  Other  dutiable  articles  are  treated 
in  the  same  way. 

Goods  not  sufficiently  described  or  not  answering  to  their 
description  are  opened,  their  contents  entered  on  a  "  bill  of 
Examination  of  sight,”  and  an  account  sent  in.  Private  effects 
g°°ds.  are  separately  examined,  being  described  on  a 
‘‘sufferance”  form;  if  they  are  bona-fide  personal  goods  no 
record  is  kept  of  them,  except  in  the  case  of  dutiable  goods, 
which  are  treated  as  ordinary  imports.  If  the  dutiable  goods 
are  concealed,  either  among  private  effects  or  merchandise, 
and  forfeited,  they  are  not  reckoned  as  imports. 

Bullion  is  entered  on  a  separate  form  and  kept  distinct 
throughout  the  accounts. 

The  duty-free  goods,  if  for  transhipment  at  another  port, 
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are  sent  there  under  seal,  and  barely  examined;-  they  are 
treated  at  the  central  office  in  the  same  way  as 
dutiable  transfer  goods.  The  remaining  free  goods,  rree  g°°ds' 


If  Sailing  Vessel  Official  No. 

or  Steamer  STEAMER.  No.  of  Register, 

Date  of  Registry, 

No.  i.  REPORT  No.  980.* 

Port  of  X. 


Ship’s  Name. 

Tonnage. 

British  or  Foreign. 

If  British,  Port  of 
Registry;  if  Foreign, 
Country  to  which  she 
belongs. 

Number  of  Crew. 

Name  of  Master, 
and  whether  a 
British  or  Foreign 
Subject. 

Port  or 
Place  from 
which 

arrived. 

British 

Seamen. 

Foreign 

Seamen. 

Marianne. 

700 

BRITISH. 

Total. . 

12 

— 

H.  Hind. 

Havre, 

France. 

Cargo. 


I. 

2. 

3- 

4- 

5- 

6. 

7- 

Packages  and  Description 

Particulars  of 

Name  or 

of  Goods.  Particulars  of 
Goods  stowed  loose,  and 
General  Denomination  of 

Packages  and 

Goods  (if  any)  to 

Names  of 

Marks 

Nos. 

Goods  (if  any)  for 

be  Transhipped 

Name  of 

Places  where 

Contents  of  each  Package 

any  other  Port  in 

or  to  remain  on 

Consignee. 

laden  in  order 

of  Tobacco,  Cigars,  or 

the  United 

Board  for 

of  time. 

Snuff  intended  to  be 
imported  at  this  Port. 

Kingdom. 

Exportation. 

Havre, 

Pari 

s  to 

London. — 600  pkgs. 

Fruit  and  Peris 

hables. 

Smith. 

France. 

68  pkgs.  Merchan  disc. 

COK 

1392 

AE 

495/6 

KG 

340/9 

EOT 

i/5o 

AJ 

3/6 

CK 

1 

AC 

10 

KL 

40 

J-  70  cases  Wine. 

•  9 

If  any  wreck 

ACD 

20 

fallen  in  with 
or  picked  up, 
to  be  stated. 

WD 

O&D 

166 

1 

5  cases  Woollens  in 
1  case  Brandy. 

transit  to  Liver 

pool. 

19 

9  9 

Stores. 


Surplus  Stores  remaining  on  board,  viz. 

Number  of  Alien  Passengers  (if  any)  - 
Pilot’s  Names  - 


3  lb.  Cigars. 

4  lb.  Tobacco. 
Nil. 


At  what  Station  Ship  lying  -  -  -  South  Quay. 

Agent’s  Name  and  Address  -  •  -  C.  J.  C. 

I  declare  that  the  above  is  a  just  report  of  my  Ship  and  of  her  Lading,  and  that 
the  Particulars  therein  inserted  are  true  to  the  best  of  my  knowledge,  and  that  I  have 
not  broken  Bulk  or  delivered  any  Goods  out  of  my  said  ship  since  her  departure  from 
Havre,  the  last  Foreign  Place  of  Loading. 

(Signed)  H.  HIND,  Master. 
Signed  and  declared  this  13th  day  of  October  1896 
In  presence  of 

(Countersigned) 

pro-  Collector. 


*  i.e.,  980th  ship  at  X.  since  ist  January. 
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which  in  general  form  the  bulk  of  the  cargo,  are  entered  on 
such  a  form  as  follows,  which  is  worth  notice,  for  it  is  a  speci¬ 
men  of  the  rough  material  from  which  our  foreign  trade  figures 
are  evaluated. 


ENTRY  FOR  FREE  GOODS.* 


This  space 
is  for  the 
use  of  the 
Officers  of 
Customs. 


Port _ 

Dock  or  Station  _ 
Importer’s  Name 


(No. 


) 


Examina- 

Ship’s  Name. 

Master’s  Name 

Rotation  No. 

Date  of  Report 

Port  or  Place  v 

hence 

tion. 

Marianne. 

II.  Hind. 

980. 

13/10/96. 

Havre,  France. 

Marks  and 

No.  of  Packages  and  Description  of  Goods, 

Value, 

Nos. 

in  accordance 

with  the  Official  Import  List. 

Quantity. 

£. 

COK  1392 

One  Goods  Manuf.  N.O.E.  Billiard 

Cue  Tips 

- 

... 

2S 

AF  495/6 

Two  Leather  Shoes 

- 

10  doz.  prs. 

58 

KG  340/9 

Ten  Cotton  Manuf.  Trimmings  - 

140 

Embroideries 

2S0 

Piece  Goods,  not  Muslins  - 

300  yds. 

8 

fot  1/10 

Ten  Gloves  of  Leather 

. 

11.240  doz.  pi 

12,316 

»  n/5 

Five  Silk  Broad  Stuffs 

•  • 

•  •  • 

10,400 

„  16/20 

Five  Works  of  Art— 

Plaster  Casts 

•  • 

•  •  • 

380 

Statuary 

•  m 

•  #  • 

1,280 

Pictures  by  Hand 

- 

3 

10,200 

„  21/5 

Five  Books  Bound 

• 

4  cwt. 

300 

„  26/30 

Five  Bronze  Manuf.  Ornaments  - 

3  cwt. 

38 

,,  3  J/5 

Five  Metal  Manuf.  Ornamental 

Brass-headed  Nails 

- 

4  cwt. 

24 

„  36/40 

Five  Silk  Manuf.  Dresses,  Mantles, 

Trimmings  - 

- 

•  •  • 

1,816 

,,  41/50 

Ten  Goods  Manuf.  N.O.E. — 

Fancy  Goods 

- 

•  •  • 

1 10 

Horseless  Carriage 

• 

•  •  • 

160 

Brushes 

-  • 

•  •  • 

78 

Glue 

... 

. 

1 10 

Billiard  Chalk 

•  • 

•  •  • 

12 

Hardware  - 

■  — 

•  •  • 

1 16 

AJ  3/6 

Four  Stationery  Ink  - 

- 

•  •  • 

48 

CK  1 

One  Iron 

and  Steel 

Manuf. 

Machinery,  British,  returned 

3  cwt. 

24 

I  enter  the  above  goods  as  free  of  duty,  and  declare 


the  above  particulars  to  be  true. 

Dated  this  13th  day  of  October  1896. 

(Signed)  J.  Jones, 

Impo)ter  or  his  Agent. 


*  In  1904  this  form  was  altered  so  as  to  distinguish  between  "place  of 
shipment  of  goods,"  which  phrase  replaced  "  whence  ”  in  the  last  heading, 
and  "place  whence  goods  consigned  ”  which  is  now  the  heading  of  an  ad¬ 
ditional  column. 
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The  information  so  received  is  usually  accepted  at  the 
central  office  without  inquiry.  It  frequently  happens,  however, 
that  the  form  is  not  properly  filled  in  by  the  agent,  the  values 
often  being  omitted.  When  this  is  so,  it  is  the  verification  of 
duty  of  the  clerk  at  the  port  of  entry  to  require  data- 
the  agent  to  complete  the  forms,  if  imperfect,  and  to  test 
the  values  by  current  price  lists  with  which  he  is  provided. 
When  there  is  a  palpable  error  or  omission  in  the  form,  or  when 
the  price  appears  out  of  the  common,  a  query  is  sent  from  the 
central  office  to  the  port  :  e.g.,  with  reference  to  such  a  form  as 
that  just  given,  the  following  correspondence  might  arise  : — 

1.  Pictures  by  hand,  £ 10,200 .  Explain  high  value. 
Answer. — Correct;  invoice  was  seen ;  pictures  by  Millet. 

2.  Books  bound  :  is  weight  or  value  incorrect  ?  Answer. — 
Both  correct ;  advice  seen ;  old  and  valuable  books. 

3.  Goods  entered  as  “  goods  manufactured,  chip  plaiting  ’’  : 
explain  nature,  and  state  if  description  is  correct.  Answer. — 
Correct ;  wood  shaving  plaited  and  occasionally  mingled  with 
horse-hair,  etc. 

4.  Potatoes,  40  cwt.,  £62.  Weight  or  value?  Answer. — 
Value  correct.  Weight  should  be  400  cwt. 

Thus  any  unusual  entries  are  liable  to  be  checked  and 
verified. 

In  the  case  of  goods  not  easily  valued,  or  of  miscellaneous 
goods  not  easily  tabulated,  errors  must  arise  in  this  way ;  and 
another  error  may  enter  if  an  agent  or  clerk,  who  Possibility  of 
does  not  wish  to  receive  too  many  queries  from  errors- 
headquarters,  enters  at  ordinary  rates  goods  of  exceptional 
value ;  but  when  staple  commodities  and  large  quantities  are 
involved,  all  the  persons  concerned  will  be  familiar  with  the 
forms  they  have  to  fill,  the  prices  will  be  known,  and  so  in  im¬ 
portant  cases  errors  will  be  at  a  minimum.  The  import  total 
values,  therefore,  are  the  sum  of  many  quantities  of  various 
degrees  of  accuracy,  and  it  is  not  difficult  when  looking  through 
the  list  of  items  in  the  annual  report  to  see  which  are  specially 
liable  to  error.  Such  commodities  as  old  books,  works  of 
art,  goods  where  sale  depends  on  the  fluctuations  of  fashion, 
racehorses,  and  so  on,  have  values  varying  from  day  to  day, 
and  their  exact  value  in  the  balance  of  imports  and  exports 
cannot  be  determined. 
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Exports. 


In  the  case  of  goods  consigned  for  sale,  a  class  which  in¬ 
cludes  the  great  part  of  the  imports  of  wool,  no  value  can  be 
named  by  the  agent.  The  goods  are  then  valued  at  current 
market  prices,  and  in  the  case  of  wool  at  the  prices  realised 
at  the  next  wool-sales.  There  is  always  a  possibility  of  error 
here,  since  the  current  prices  may  not  be  exactly  obtained  for 
a  particular  consignment ;  and  there  is  apparently  permanent 
overvaluation  of  wool,  since  the  price  at  the  sales  is  presum¬ 
ably  the  price  of  wool  landed  and  warehoused,  while  the  value 
for  import  records  should  exclude  the  cost  of  unloading  and 
moving. 

The  quantities  and  values  of  exported  goods  are  filled  in 
by  the  shipper  or  agent,  and  the  papers  sent  through  the 
Custom  House  officials  or  directly  to  the  central 
office  within  six  days  of  the  ship’s  clearing.  The 
specification  given  on  p.  49  is  an  abridgment  of  the  form  used  : — 

The  forms  for  British  and  Irish  goods  are  distinct  from 
those  for  foreign,  free  and  duty-paid,  goods ;  and  there  are 
distinct  export  forms  for  transhipments,  which  have  already 
been  registered  as  imports.  In  these  cases  the  specification 
and  quantities  are  likely  to  be  correct,  but  there  are  causes 
which  may  falsify  the  values.  If  they  are  to  be  subject  to  an 
ad  valorem  duty,  they  may  be  undervalued ;  if  they  are  adulter¬ 
ated  goods,  masquerading  as  genuine,  they  may  be  over¬ 
valued.  It  seems  hardly  possible  to  estimate  these  errors. 

We  are  now  in  a  position  to  define  imports  and  exports 
according  to  their  meaning  in  the  Board  of  Trade  Returns ; 

Definition  of  as;  f°r  instance,  when  for  1913  the  value  of 
official  imports  imports  is  stated  as  £769,000,000,  and  of  exports 

and  exports.  rr  _  r  i  •  t  r 

as  £035,000,000,  of  which  £110,000,000  are  re¬ 
exports  of  foreign  or  colonial  goods.  In  the  following  state¬ 
ment  the  details  already  shown  are  supplemented  from  the 
definitions  given  in  recent  years  in  the  introduction  to  the 
Annual  Statement  of  the  Trade  of  the  United  Kingdom. 

Under  imports  are  included  all  goods  landed  through  the 
custom-houses,  including  goods  immediately  shipped  as  stores 
or  returned  from  customers  unused,  with  the  following  excep¬ 
tions  :  (a)  fish  of  British  taking  landed  in  British  ships  arriving 
direct  from  the  fishing  grounds,  goods  directly  imported  by 
ambassadors  and  ministers  accredited  to  this  kingdom,  old 
vessels  bought  from  foreigners;  and  (b)  sacks,  cases,  etc., 
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Dated  13th  October  1896.  (Countersigned) _ _ 

Officer  of  Customs. 

f  A  column  headed  Final  Destination  of  Goods  ”  has  been  added  since  1904. 
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used  as  packages,  passengers’  luggage,  ships’,  stores,  ballast, 
and  military  and  naval  stores  on  board  Government  vessels, 
goods  transhipped  under  bond,  and  goods  in  transit  through 
the  country  on  a  through  bill  of  lading  (of  which  separate 
accounts  are  given),  and  goods  unlanded  and  so  reported. 

Under  exports  are  included  all  goods  entered  on  ships 
bills  of  lading,  excluding  the  classes  after  (b)  in  the  previous 
paragraph ;  new  ships,  leaving  our  shores  sold  to  foreigners 
are  included  since  1899. 

Goods  immediately  reshipped  at  the  same  or  another  port, 
or  held  in  bond  and  then  reshipped,  are  included  in  imports, 
and  in  exports  are  distinguished  as  Exports  of  Foreign  and 
Colonial  Produce. 

Bullion  and  coin  are  not  included  in  the  general  totals  of 
imports  or  of  exports,  but  are  recorded  in  separate  tables. 
Coin  carried  privately  and  the  great  part  of  diamonds  imported 
or  exported  (a  quite  important  item)  are  not  recorded. 

The  treatment  of  coal  throws  light  on  these  paragraphs. 
Coal  taken  for  use  on  the  voyage  is  registered,  but  not  included 
among  exports ;  coal  as  cargo  is  included. 

The  value  of  imports  reckoned  is  the  nominal  exchange 
value  just  before  they  are  landed,  and  so  includes  all  payments 
due  to  foreigners,  shippers,  underwriters,  etc., 
and  shipping  dues,  and  none  to  stevedores,  dock- 
labourers,  etc.  The  value  of  exports  is  the  value  “  free  on 
board.”  The  exact  definition  of  the  values,  here  and  in  other 
countries,  is  of  primary  importance  in  studying  the  balance  of 
trade.* 

Great  difficulty  is  experienced  in  classifying  exports  accord¬ 
ing  to  their  countries  of  destination  and  imports  according  to 
their  countries  of  origin ;  the  details  first  asked  for  in  1904 
(see  notes  on  pp.  46  and  49)  have  led  to  greater  accuracy  and 
definiteness  on  these  questions.  In  the  accounts  of  trade 
there  have  been  since  1904  two  sets  of  tables,  and  the  newer 
ones  relating  to  countries  of  consignment  are  now  given  the 
greater  importance,  f 

Very  great  care  is  necessary  in  using  the  accounts  of  foreign 


*  See  the  Reports  of  the  Committee  of  the  British  Association  on  The 
Accuracy  ...  of  ..  .  Statistics  of  International  Trade,  1904  and  1905. 

f  See  Committee  on  Trades  Records  (Cd.  4346),  and  compare  a  current 
Statistical  Abstract  of  the  United  Kingdom  with  those  issued  circa  1910  and  1903. 
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trade  during  the  war  period.  The  class  named  above  "  military 
and  naval  stores  on  board  Government  vessels  ”  excluded 
from  the  accounts  assumed  vast  dimensions. 


A  very  good  example  of  an  official  inquiry  is  to  be  found 
in  the  Census  of  Production  (1907)  of  which  the  results  were 
published  in  1912  (Cd.  6320).*  Special  attention  may  be 
directed  to  the  relation  between  the  qucesitum ,  the  ultimate 
object  of  the  inquiry,  the  data  which  it  proved  to  be  possible 
to  collect,  and  the  adjustment  of  the  questions  so  that  the 
answers  could  readily  and  accurately  be  given  by  the  employers 
in  various  industries. 


*  Examples  of  the  Blank  Schedules  used  can  be  seen  at  the  School  of 
Economics. 


CHAPTER  IV. 
TABULATION . 


Leaving  now  the  consideration  of  blank  forms  of  inquiry, 
let  us  turn  to  the  methods  by  which  our  data,  accumulated  on 
these  forms,  can.be  tabulated.  At  first  sight  the  tabulation 
of  so  many  million  census  forms,  so  many  schedules  of  wages, 
and  so  many  lists  of  goods  imported,  seems  mere  office  work, 
to  be  done  mechanically,  only  requiring  accuracy  and  not 
subject  to  scientific  analysis.  Tabulation  does,  indeed,  involve 
a  great  deal  of  automatic  labour;  but  the  determination  of 
the  exact  form  of  the  table  and  the  choice  of  the  headings  to 
which  the  totals  shall  correspond  task  the  administrative 
statistician,  and  are  worth  the  closest  study. 

The  function  of  tabulation  in  the  general  scheme  of  a 
statistical  investigation  is  sufficiently  definite ;  it  is  to  arrange 
The  function  of  in  easily  accessible  form  the  answers  to  those 
tabulation.  questions  with  which  the  investigation  is  con¬ 
cerned.  If  it  is  required  to  know,  for  instance,  the  number  of 
persons  of  each  sex  and  age-group  in  all  the  districts  of  the 
country,  the  figures  in  the  table  must  show  these  numbers. 
Or,  to  take  a  less  definite  problem,  we  want  all  the  information 
possible  as  to  annual  earnings.  In  studying  the  forms  issued 
for  the  Wage  Census,  we  have  seen  that  the  information  which 
can  be  obtained  is  not  precisely  that  which  we  require.  The 
problem  then  is  so  to  tabulate  our  information  that  our  totals 
may  give  answers  as  near  to  our  requirements  as  possible, 
and  it  can  easily  be  found  by  experiment  that  the  way  to  do 
this  is  by  no  means  obvious. 

Not  only  must  the  figures  be  grouped  so  as  to  answer  the 
questions  put  forward  in  the  original  scheme,  but  if  the 
information  is  of  wide  and  varied  interest,  as  in  all  the  inves- 
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tigations  so  far  considered,  the  data  must  be  studied  from  many 
points  of  view,  and  tabulated  so  that  students  in  all  branches 
of  knowledge  may  be  able  to  extract  from  our  tables  the  infor¬ 
mation  they  require.  Thus  the  population  census  is  used  by 
the  financier,  the  legislator,  the  merchant,  and  the  commercial 
traveller;  political  economists  turn  to  it  for  light  on  the 
development  of  industry,  and  on  the  change  of  numbers  in 
each  trade;  those  interested  in  social  questions  will  study 
the  ages  and  sex-distribution  in  various  districts  or  occupa¬ 
tions;  the  sociologist  and  biologist  will  need  accurate  infor¬ 
mation  as  to  the  growth  of  population  and  the  change  of  age 
distribution. 

To  take  more  specific  points,  the  blue-book  which  contains 
the  tabulation  of  foreign  trade  statistics  will  be  expected  to 
show  how  our  trade  with  each  country  is  developing,  whether 
we  are  holding  or  improving  our  position  in  certain  markets ; 
whether  we  are  exhausting  our  supply  of  raw  materials ; 
whether  some  new  commodity  is  yet  of  importance.  It  must 
be  remembered  that  the  original  material  is  not  accessible  to 
the  public,  that  they  are  dependent  on  the  information  ex¬ 
tracted  for  them,  and  that,  though  it  would  be  possible  to 
turn  through  all  the  forms  for  special  data,  yet  the  labour 
needed  would  be  prohibitive,  while  a  little  more  detail  in 
the  tabulation  might  easily  have  isolated  the  information 
needed. 

The  method  of  tabulation  should  be  taken  in  relation  to 
the  conception  of  characteristics  explained  above  (p.  20). 
Each  person  or  thing  in  a  group  possesses  certain  Tabulation  and 
adequately  defined  characteristics,  say  A,  B,  C,  charactenstkS. 
and  D.  They  also  possess  one  or  other  of  the  charac¬ 
teristics  Ex,  E2,  E3  .  .  .,  and  one  or  other  of  Fv  F2, 
F3  .  .  .,  etc.  A  table  in  single  tabulation  shows  separately 
the  totals  under  each  characteristic,  Ex,  E2,  etc.  The  heading 
of  the  table  gives  directly  or  by  reference  the  definitions  of 
A,  B,  C,  and  D,  and  contains  frequently  some  such  phrase  as 
“  in  each  locality  ”  if  the  E  characteristic  is  a  locality.  Each 
line  in  the  first  column  then  defines  an  E.  A  double  tabulation 
shows  the  classification  both  by  E  and  by  F,  the  heading  of 
each  column  defining  an  F,  so  that  an  entry  shows  the  number 
of  persons  who  possess,  say,  the  characteristics  A,  B,  C,  D, 
E2,  and  F3.  The  horizontal  totals  show  the  totals  who  have 
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characteristics  Ex,  E2,  etc.,  and  the  totals  of  the  columns  relate 
similarly  to  Fx,  F2,  etc. 

For  convenience,  the  methods  of  tabulation  may  be  divided 
into  three  groups  :  A.  The  simple  statement  of  totals  of 
Three  groups  of  persons  or  things  which  satisfy  given  conditions, 

tabulations,  such  as  number  living  in  a  town,  or  the  total 

value  of  imports  from  France;  B.  The  grouping  of  a  great 
number  of  units  in  relation  to  some  particular  property  pos¬ 
sessed  by  all,  with  the  object,  not  of  answering  assigned  ques¬ 
tions,  but  of  putting  the  material  in  a  form  ready  for  use 
in  further  investigations — e.g.,  the  population  according  to 
ages,  or  wage-earners  according  to  the  value  of  their  wages ; 
C.  The  tabulation  of  non-numerical  answers  in  suitable  groups 
to  give  a  view  of  the  whole — e.g.,  the  causes  of  strikes  or  the 
state  of  employment.  The  division  between  groups  A  and  B 
is  not  always  definite. 

In  the  tabulation  the  convenience  of  the  reader  must  be 
studied.  The  table  must  be  so  arranged  that  any  totals 
required  can  instantly  be  found.  This  is  to  a  great  extent  a 
question  of  typography,  the  use  of  suitable  founts  for  figures 
and  headings,  and  also  of  the  choice  of  the  right  shape  and 
size  of  page.  Supposing  the  best  possible  choice  made  in 
these  respects,  our  rule  will  then  be  to  get  the  maximum 
amount  of  information  into  a  given  space. 

Group  A. — Thus  we  can  have  single  tabulation,  answering 
cusses  of  tabu-  one  or  more  groups  of  independent  questions, 

lation.  . 


Number  and  Membership  of  Trade  Unions.* 


Year. 

N umber  of  Trade 
Unions  at  end  of 
Year. 

Total  Membership  of  these 
Unions  at  end  of  Year. 

1S96 

U3I7 

1,493,375 

1S97 

L307 

1,611,384 

1898 

I.267 

1,644,591 

1 

Double  tabulation  shows  the  subdivision  of  a  total  accord¬ 
ing  to  two  categories,  in  the  example  giving  on  p.  55, 
according  to  sex  and  age  : — 


*  Compiled  from  the  Sixth  Annual  Abstract  of  Labour  Statistics ,  p.  1. 


Number  of  Males  and  Females  of  all  ages  engaged  in  certain  occupations  in  England  and  Wales  and  Scotland  in  1881  and  1891. 


ENGLAND  AND  WALES. 

SCOTLAND. 

TOTALS.  . 

Males. 

Females. 

Males. 

Females. 

ooo’s  omitted. 

1881. 

1S9I. 

i88i. 

1891. 

1881. 

1891. 

1881. 

1891. 

1881. 

I89I. 

Numbers 

employed 

omitted. 

Per 
10,000 
Males 
above  jo 
years  of 
age. 

Numbers 

employed. 

omitted. 

Per 

Males 
above 
JO  years 
of  age. 

Numbers 

employed 

omitted. 

Per 

70,000 

Females 

above 

10  years 
of  age. 

Numbers 

employed. 

ooo's 

omitted. 

Per 

JO, OOO 
Females 
above 

10  years 
of  age. 

Numbers 

employed 

ooo's 

omitted. 

Per 

JO, OOO 
Males 
above 

JO  years 
of  age. 

Numbers 

employed 

ooo’s 

omitted. 

Per 

10,000 

Males 

above 

10  years 
of  age. 

Numbers 

employed 

ooo's 

omitted. 

Per 
10,000 
Females 
above 
jo  years 
of  age. 

Numbers 

employee 

ooo's 

omitted 

Per 

10,000 

Females 

above 

10  years 
of  age. 

Males. 

Females. 

All. 

Males. 

Females. 

All. 

Cotton  Manufactures  - 

n 

*99 

212 

201 

303 

303 

332 

2go 

3 

23 

6 

42 

26 

*75 

13 

82 

188 

329 

517 

218 

345 

553 

Woollen  Manufactures 

93 

IOO 

102 

96 

123 

123 

132 

IIS 

14 

IO4. 

14 

94 

16 

in 

18 

**5 

107 

139 

246 

116 

150 

266 

Flax  and  Linen  Manu¬ 
factures  - 

4 

4 

3 

3 

8 

8 

6 

5 

8 

67 

7 

48 

20 

136 

19 

120 

12 

28 

40 

10 

25 

35 

Boot  and  Shoe  Manu¬ 
factures  - 

188 

202 

202 

191 

36 

3& 

46 

40 

22 

170 

19 

i2g 

2 

14 

3 

*9 

210 

38 

248 

221 

49 

270 

All  occupations  - 

7,763 

8,324- 

8,805 

8,314 

3,402 

3,4<>S 

3,945 

3,44* 

1,092 

8^02 

1,204 

8,32s 

484 

3,3*3 

544 

3,4°° 

8.845 

3,886 

12,731 

10,009 

4,489 

14,498 

Population  above  10 
years  of  age  - 

9,314 

10,592 

9,992 

11,461 

1,313 

1,446 

1,462 

1,699 

10,627 

“.454 

22.081 

12,038 

13,060 

25.098 

To  face  page  55. 
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Classification  of  Paupers  in  Ireland. — Total  Numbers  who 
received  Relief  during  the  Year  ended  Lady  Day  1892.* 


Ages  of  Persons  Relieved. 

Males. 

Females. 

Total. 

Under  16  years 

44,391 

43,648 

88,039 

Of  16  and  under  65  years 

132,370 

79,045 

211,415 

80,789 

Of  65  years  and  upwards 

35A2I 

45,668 

All  ages  • 

211,882 

168, 3G1 

380,243 

More  information  may  be  included  thus  : — 

Classification  of  Paupers  in  England  and  Wales. — Total 
Numbers  who  received  Relief  during  the  Year  ended  Lady  Day 
1892.1 


1 

Ages  of  Persons  Relieved. 

Indoor. 

Outdoor. 

Total. 

I 

Metro¬ 

polis. 

Other  Parts 
of  England 
and  Wales. 

Under  16  years 

Of  16  and  under  65  years 
Of  65  years  and  upwards 

1 1  r,7S2 
232,284 

114,144 

441.803 

385,299 

287,760 

553,587 

617,583 

410,904 

100,671 

148,066 

64,779 

452,916 
46901 7 
337,125 

All  ages  - 

458,210 

1,114,864 

1,573,074 

313,516 

1 ,259,558 

A  treble  tabulation  can  be  used,  subdividing  the  total 
into  three  distinct  categories,  with  cross  totals  for  each  group. 
Thus  the  table  on  p.  56  gives  separate  divisions  according  to 
age,  sex,  and  district ;  percentage  lines,  in  a  distinct  type, 
are  also  introduced  : — 

The  same  process  can  be  further  extended  :  the  example 
in  the  table  opposite  shows  an  arrangement  for  a  quadruple 
tabulation,  distribution  by  district,  date,  sex,  and  industry, 
with  subsidiary  information ;  but  it  is  generally  better  to  use 
two  or  more  tables  than  to  increase  the  complication,  unless 
it  is  necessary  to  bring  several  categories  into  close  relation. 
Suitable  varieties  of  type  will  often  make  comparisons  easy  in 
a  very  complex  table. 

*  Compiled  from  the  Sixth  Annual  Abstract  of  Labour  Statistics,  p.  102, 

f  Ibid.,  p.  101. 
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Looking  now  at  the  census  householders’  schedule  (facing 
p.  22 j,  we  can  see  that  there  are  about  thirteen  different  items 
of  information  about  each  person  :  district,  posi-  Tabulation  of 
tion  in  family,  condition  as  to  marriage,  children,  census  material, 
sex,  age,  occupation,  industry,  industrial  status,  infirmity, 
birthplace,  nationality,  and  house-room.  These  could  be 
tabulated  in  78  different  double,  286  treble,  or  715  quadruple 
tabulations,  so  that  there  is  plenty  of  scope  for  choice. 

To  fix  our  ideas,  we  will  take  occupation  as  the  main  sub¬ 
division,  and  examine  Mr.  Booth’s  use  of  the  Mr.  Booth’s 
census  returns,  say  for  London  Printers.*  tabulation. 

First  he  gives  a  treble  classification — occupation,  sex,  and 
age — using  columns  corresponding  to  3,  4  and  10  of  the  1911 
schedule. 


Census  Divisions,  1891 

Females. 

Males. 

Total. 

All  Ages. 

-19. 

20-54. 

55’ 

I.  Printer  - 

1,316 

9,988 

21,784 

1,921 

35,009 

2.  Lithographer,  &c.  - 

809 

757 

3,037 

437 

5,040 

Total  - 

2,125 

10,745 

24,821 

2,358 

40,0/9 

Then  follows  a  single  table,  district  and  numbers,  using  the 
information  on  the  back  of  the  schedule. 


Distribution. 


E. 

N. 

W.  &  C. 

s. 

Total. 

5,884 

9,835 

7,577 

i6,753 

40,049 

Three  simple  tables  are  then  given,  relating  to  heads  of 
families,  using  columns  2,  3  and  4  (sex),  2  and  14  (birthplace), 
and  2  and  12  (industrial  status). 


*  Life  and  Labour  of  the  People 4  vol,  vin  p.  189. 
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His  next  table  uses  columns  2  and  io,  and  is  as  follows  : — 


Total  Population  Concerned. 


Heads  of 
Families. 

Others 

Occupied. 

Unoccupied. 

Servants. 

Total. 

Total  ... 

18,048 

16,060 

47,257 

"Sf 

00 

82,219 

Average  in  Family  - 

I 

.89 

2.62 

•05 

4*5G 

The  next  table  (not  here  given)  is  a  single  classification 
according  to  number  of  rooms  and  servants,  a  most  ingenious 
indirect  use  of  the  scheduled  information ;  and  the  last  is  an 
example  of  the  legitimate  use  of  a  quadruple  tabulation — 
occupation,  industrial  status,  sex,  and  age — given  on  the 
next  page. 

It  would  be  difficult  to  find  a  better  example  of  tabulation 
of  a  great  multitude  of  details  to  serve  a  special  purpose.  The 

The  census  census  authorities  had  in  many  cases  not  tabu- 

tabuiations.  }ated  the  necessary  details,  and  it  was  necessary 

to  turn  through  the  original  schedules  to  get  at  the  facts. 
For  such  work  as  this,  the  function  of  tabulation  is  simply  to 
provide  the  answers  to  definite  questions.  Thus  the  census 
reports  show  how  many  persons  of  each  sex  and  age-group 
belong  to  certain  industries  in  certain  places,  in  a  quadruple 
tabulation  extending  over  many  pages,  each  page  relating  to 
one  district,  and  this  table  may  be  used  for  accomplishing 
many  separate  purposes  :  each  item  is  already  a  total  ready 
for  use.  It  is  impracticable  from  limits  of  time  and  space, 
even  if  it  were  desirable,  to  tabulate  all  the  possible  groups  of 
qualities  which  can  be  made  from  all  the  statements  on  each 
census  form;  a  good  tabulation  will  aim  at  providing  only 
those  statements  which  are  of  practical  use.  Thus  many 
simply  descriptive  totals  are  given,  such  as  the  numbers  of 
each  sex  and  age  in  each  parish  in  the  United  Kingdom,  to 
serve  primarily  for  administrative  purposes ;  and  many  state¬ 
ments  which  will  afford  the  economist  and  sociologist  the 
opportunity  of  tracing  the  progress  of  industries,  of  studying 
the  ages  of  workpeople  in  different  occupations,  the  changes 
in  age-grouping  of  the  nation ;  and  some  further  tables  might 


Status  as  to  Employment  (according  to  Census  Enumeration). 
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no  doubt  be  given  to  throw  light  on  problems  of  special  interest. 
In  each  successive  census  new  tables  are  to  be  found. 

It  is  interesting  to  open  one  of  these  great  tables  of  figures, 
such  as  are  generally  to  be  found  forming  the  bulk  of  a  blue- 

Minu^ise  book,  and  taking  a  figure  at  random,  ask  “  Why 
is  this  figure  printed,  what  question  does  it  answer, 
to  whom  can  it  give  information  ?  ”  For  instance,  in  the 
Eighth  Report  on  Trade  Unions ,  p.  257,  we  find  that  the  United 
Brickworkers’  and  Brick  Wharf  Labourers’  Union  spent  £20 
on  funeral  expenses  in  1894,  an  average  of  35.  y\d.  per  member. 
As  an  isolated  statement  this  may  interest  a  very  small  number 
of  persons ;  but  that  small  number  has  a  right  to  expect  that 
they  shall  find  the  figures  relating  to  their  union  tabulated 
in  a  general  official  book ;  to  them  it  may  be  as  important  as 
the  item,  on  the  same  page,  of  £5,481  spent  by  the  Boiler¬ 
makers.  From  this  point  of  view,  the  question  of  inclusion  of 
such  small  items  is  simply  one  of  space.  If  space  is  limited, 
a  selection  would  be  made  of  larger  quantities  only,  as  being 
likely  to  concern  more  people. 

But  there  is  a  reason  of  quite  another  character  for  printing 
such  items  as  these.  The  raw  material,  on  which  the  totals 
importance  of  in  such  tables  are  based,  is  not  accessible  to  the 
raw  material,  student  except  by  means  of  this  Report.  Now, 

the  compiler  of  these  statistics  cannot  know  from  what  par¬ 
ticular  point  of  view  they  will  be  studied.  It  may  be  desired 
to  examine  and  group  trade  unions  according  to  their  expendi¬ 
ture  on  different  items,  to  study  their  history,  classifying  them 
as  fighting  organisms  and  as  friendly  societies.  The  tabula¬ 
tions  needed  cannot  well  be  foretold.  The  material  is  there¬ 
fore  given  in  the  rough,  in  order  that  the  tabulation  may  be 
made  by  each  student  according  to  his  needs.  At  the  same 
time  the  most  suggestive  totals  are  given  as  one  of  these 
possible  methods  of  tabulation ;  and  in  the  summary  of  such 
a  report,  the  items  are  retabulated,  the  rough  material 
being  omitted,  in  those  ways  which  the  editor  thinks  most 
useful. 

When  space  is  much  too  limited  for  any  publication  in 
extenso  of  the  items,  a  careful  selection  must  be  made  of  those 
selection  of  to  be  printed;  and  it  is  this  selection  that  is 
raw  material,  generally  open  to  most  criticism. 

The  Census  supplies  an  illustration  from  the  County  Borough 
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of  Coventry,  1911,*  where  the  following  detail  is  given  for 
1 15  persons  : — 


Brick,  Cement,  Pottery  and  Glass.  Males. 


1 

Age  -  -  10- 

13- 

14. 

15- 

l6. 

17- 

18. 

19. 

20- 

25- 

35- 

45- 

55* 

65- 

Totals. 

Workers  -  — 

— 

2 

3 

2 

3 

I 

4 

12 

36 

23 

17 

4 

107 

Dealers  ...  — 

I 

4 

— 

I 

I 

I 

8 

while  all  the  males — masters,  foremen,  skilled  workmen, 
labourers  and  boys — engaged  in  the  cycle  and  motor-car 
trade  are  shown  in  no  more  detail  than  : — 


Vehicles. 


Age  - 

IO- 

13- 

14. 

i5- 

16. 

17- 

18. 

19. 

20- 

25- 

35- 

45’ 

55- 

65- 

Totals. 

Cycle  and  Motor  Car — 
Makers,  Mechanics 

I 

192 

27T 

3°3 

325 

37i 

372 

2,003 

3,872 

2,488 

1,122 

379 

76 

n,775 

Motor  Car — 

Makers,  Mechanics 

— 

— 

70 

10S 

M3 

160 

219 

20S 

1 

/,  2992,324. 

1,376 

537 

*58 

36 

6,838 

Others 

2 

I 

4 

8 

7 

31 

02 

24 

21 

4 

213 

It  is  explained  on  p.  iii  of  the  volume  that  full  particulars 
(by  age)  of  relatively  important  occupations  in  a  district  are 
shown  in  italics. 

In  such  cases,  two  useful  rules  might  be  applied  :  omit  all 
(  numbers  under,  say,  500  when  by  so  doing  a  line  of  print 
would  be  saved;  and  give  all  numbers  over  10,000  correctly 
only  to  the  nearest  100,  and  so  for  other  digits  in  proportion, 
thereby  reducing  the  width  of  columns  of  print.  If,  for  example, 
we  knew  to  the  nearest  100  the  exact  numbers  in  each  district 
and  occupation  in  which  as  many  as  1000  were  Economy  of 
employed,  our  knowledge  would  be  as  complete  space- 
as  we  needed ;  and  it  is  doubtful  whether  the  space  occupied 
by  such  a  tabulation  would  be  more  than  that  already  devoted 
to  the  subject.  In  many  cases,  on  the  other  hand,  it  is  essential 
to  have  the  raw  material  quite  unchanged.  Each  tabulation 
must  be  judged  on  its  own  merits. 

It  may  be  useful  to  take  a  particular  group  of  answers,  and 
discuss  what  tabulations  will  throw  most  light  on  the  questions 
at  issue.  The  Poor  Law  Commissioners  of  1833  Tabulation  o{  the 
collected  information  from  a  thousand  villages  in  Poor  Law 
England  and  Wales  on  the  following  six  points  Returns’ l833> 


*  Census  Report,  Cd.  7019,  p.  597. 
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among  others  :  the  wages  of  an  agricultural  labourer  in  summer 
and  in  winter,  both  with  and  without  the  inclusion  of  beer  as 
part  payment,  his  annual  earnings,  and  the  subsidiary  earnings 
of  his  wife  and  children.  It  may  be  supposed  that  the  chief 
object  of  the  Commissioners  was  to  find  whether  the  labourers’ 
families  earned  enough  for  their  support,  and  what  proportion 
was  earned  by  the  wives  and  children. 

The  following  scheme  of  tabulation  would  show  in  what 
counties  the  labourer  was  badly  off : — 


County. 

Average  Annual  Earnings  of 

1 

Man. 

Family. 

Together. 

1 

1 

The  counties  might  be  taken  in  alphabetical  order  for  con¬ 
venience  of  reference,  or  in  geographical  order  with  subordinate 
averages  for  groups  (e.g.,  Eastern:  Norfolk,  Suffolk,  Essex); 
or  the  counties  might  be  arranged  in  the  order  of  the  total 
earnings,  so  that  it  could  be  seen  at  a  glance  in  which  counties 
the  labourers  were  worst  off. 

To  show  the  number  of  villages,  county  by  county,  in  which 
the  earnings  were  below  a  certain  minimum,  or  within  certain 
limits,  the  table  given  on  p.  63  might  be  used. 

This  table  can  be  used  in  the  above  complex  form  or  simpli¬ 
fied.  The  number  of  subdivisions  of  money  to  be  distinguished 
depends  on  the  space  at  disposal  and  on  the  number  of  villages 
which  would  be  entered  in  each.  A  table  in  which  most  of 
the  entries  are  1  or  0  is  open  to  criticism.  In  the  above  table 
the  villages  are  too  few  to  allow  accuracy  in  percentage. 

It  will  be  seen  that  this  table  would  furnish  the  answer  to 
almost  all  questions  which  could  be  put  as  to  total  earnings. 

Tabulation  to  For  instance,  if  we  wish  to  see  the  relation  between 
show  correlation.  £0tal  earnings  and  the  family’s  subsidiary  con¬ 
tribution,  we  should  look  at  the  smallest  totals  in  the  last 
column  but  one  and  see  if  they  corresponded  with  the  largest 
percentage  of  family  earnings.  If  we  found  signs  of  corre- 
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spondence  we  should  rearrange  the  counties  in  the  order  of 
these  subsidiary  percentages,  and  see  if  they  were  approxi¬ 
mately  in  order  of  total  earnings  also.  This  is  an  example  of 
tabulation  to  show  correlation,  the  correspondence  in  the 
occurrence  of  two  sets  of  phenomena. 

Another  important  group  of  questions  arising  in  connection 
with  these  tables  is  :  What  is  the  relation  between  weekly 
wages  and  annual  earnings,  and  what  proportion  Wages  and 
of  the  wage  is  generally  paid  in  kind  ?  We  shall  earnings. 


Annual  Earnings  of  Men  and  Families. 


Number  of  Villages  in 

which  the  Total  Earnings  averaged 

Average  Earnings  in 
County  of 

• 

U 

<u 

P<  . 

£25. 

C  0 

Cl  0) 

V  JD 
§  * 

<  a 

0  ^ 

CO  1) 

31 

T3  x 
c  0 

m  v 

0)  JZt 
>0 

<  G 

G  10 

0  ^ 

4)  _Q 
% 

5  0 

<  c 

Above  ^45  and 
not  above  ^50. 

6 

to 

s? 

V 

> 

0 

< 

G 

a 

*■£ i 

Family. 

Total. 

W>fH 

g«« 

3  0 

b  0 

W  « 

>»  C 

s  0 

In  Norfolk 

0 

I 

0 

0 

6 

4 

3 

2 

£30 

27 

Pei  cent  ages  of 
Total  Number 
c  f  Villages 

0 

5 

16 

3i  h 

21 

16 

jo£ 

... 

In  Suffolk 

O 

3 

4 

5 

3 

2 

2 

£28 

£ 11 

£39 

28 

Percentages  of 
Total  Number 
0/  Villages 

O 

16 

21 

26 

16 

io\ 

io} 

In  Essex  - 

I 

3 

6 

7 

IO 

3 

I 

£28 

£10 

£3% 

26 

Percentages  of 
Total  Number 
of  Villages 

3 

JO 

19 

23 

32 

10 

3 

... 

... 

... 

... 

In  Eastern 

Counties 

1 

7 

13 

18 

17 

8 

5 

£28  10 

^10  IO 

^39 

27 

Percentages  of 
Total  Number 
of  Vihages 

1 

JO 

19 

26 

25 

12 

7 

... 

... 

... 

not  now  require  the  statements  as  to  subsidiary  family  earnings. 
In  records  of  agricultural  wages  the  most  common  statement 
was,  e.g.}  “  wages  in  this  district  are  from  10s.  to  12s.  a  week/' 
Now,  a  farm  labourer  did  not  generally  earn  as  much  in  winter 
as  in  summer,  because  wages  were  reduced  to  correspond  to 
the  smaller  amount  of  work  necessitated  by  failing  light ;  from 
this  cause  annual  earnings  will  be  less  than  the  weekly  wage 
multiplied  by  52.  Besides  this  wage  he  generally  receives 
special  money  at  hay  and  wheat  harvests,  and  also  many 
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payments  in  kind,  such  as  daily  beer,  house  and  ground  at 
reduced  rent,  and  other  privileges.  It  is  generally  best  to 
value  all  these,  and  compute  his  earnings  thus  : — 


10s.  for  38  weeks  - 

£* 9 

0 

0 

12s.  for  9  weeks  (summer) 

5 

8 

0 

Hay  harvest,  1  week 

0 

15 

0 

Wheat  harvest,  4  weeks 

5 

0 

0 

Beer,  is.  per  week 

2 

12 

0 

Cottage  and  ground 

5 

0 

0 

Other  perquisites  - 

1 

5 

0 

£39  0  0  =  15s.  per  week. 

In  this  case  earnings  are  50  per  cent,  above  the  general 
weekly  wage.  An  estimate  of  this  nature  has  been  made  by 
the  late  Mr.  Little  for  each  county  for  1867-70  and  1892. 
The  question,  Are  winter  wages  generally  below  summer 
winter  and  wages,  and  by  how  much  ?  can  be  answered  by 
summer  wages,  ^he  following  scheme  of  tabulation,  which  uses 

the  data  not  employed  in  the  previous  tables  : — 


Counties. 

Average  Weekly 
Wage  in 

Number  of  Villages 
Summer  Wages 

where  the  Excess  of 
over  Winter  was 

Summer. 

Winter. 

Nothing. 

6d. 

IS. 

is.  6d. 

2S. 

More 
than  2s. 

Norfolk  - 

s.  d. 

II  2 

s.  d. 

10  3 

13 

2 

3 

2 

*> 

3 

Percentage  of  Number  of  Villages 
included  ..... 

46 

7 

n 

7 

l8 

11 

Suffolk  - 

IO  2 

9  8 

24 

0 

6 

1 

2 

1 

Percentage  of  Number  of  Villages 
included  ..... 

70 

0 

18 

3 

6 

3 

Essex  ... 

10  9 

9  IO 

22 

0 

1 1 

0 

5 

4 

Percentage  of  Number  of  Villages 
included  '  - 

52 

j 

0 

26 

0 

12 

IO 

Eastern  Counties 

IO  6 

9  u 

i 

59 

2 

20 

3 

1 2 

8 

Percentage  of  Number  of  Villages 

included  ..... 

57 

2 

13 

3 

12 

8 
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These  examples  do  not  quite  exhaust  the  useful  tabulations 
of  these  groups  of  figures,  for  we  have  not  yet  examined  the 
distribution  of  wages,  that  is  the  relative  numbers  paid  at 
different  rates.  These  returns  do  not,  however,  illustrate 
such  a  tabulation  well,  for  we  are  not  told  the  rates  paid  to 
individuals,  but  only  the  rate  prevalent  in  the  villages. 

Group  B. — The  grouping  according  to  wages  affords  an 
example  of  the  second  method  of  tabulation.  We  have  now 
no  definite  questions  to  answer,  as  in  the  method  so  far  dis¬ 
cussed,  but  a  more  general  problem  :  given  a  mass  of  data,  it 
is  required  to  tabulate  it,  so  as  to  present  the  maximum 
amount  of  useful  information.  Our  raw  material  is  so  many 
thousand  isolated  statements,  which  must  be  focussed,  made 
to  present  definite  meaning,  and  worked  up  so  as  to  be  useful 
for  future  comparison. 

Some  investigations  are  undertaken  not  to  answer  any 
definite  questions  or  to  throw  light  on  any  given  problem, 
but  to  collect  information  which,  though  it  has  Statistics  whose 
no  immediate  use,  is  likely  to  be  needed  ulti-  purpose  is  not 
mately  by  many  investigators  occupied  with 
various  questions.  Such  is  a  wage  census.  So  long  as  we 
have  no  sufficient  account  of  wages,  we  are  badly  informed  as 
to  one  of  the  most  important  measurements  of  the  social  body, 
and  economists  and  statisticians  are  continually  hindered  by 
the  want  of  data  essential  for  their  work ;  but  the  census  has 
no  immediate  practical  use,  for  knowing  the  height  of  wages 
does  not  help  us  directly  to  regulate  that  height.  In  such  an 
investigation  our  object  will  be  to  examine  the  figures,  and 
give  all  the  groupings  and  averages  which  seem  likely  to  be 
useful  for  any  purpose ;  and  while  doing  this  we  shall  imper¬ 
ceptibly  pass  to  a  different  class  of  investigation;  we  shall 
be  finding  a  structure  underlying  our  multifarous  details ;  we 
shall  find  that  the  chaos,  which  our  figures  present  at  first 
sight,  obeys  laws;  we  shall  be  making  a  visible  outline,  and 
giving  a  definite  shape  to  our  apparently  featureless  mass. 

The  complete  discussion  of  this  problem  belongs  to  a  later 
chapter;  but  the  tabulation  can  be  begun  without  special 
technique.  The  examples  taken  will  relate  chiefly  to  wages, 
but  the  methods  are  quite  general. 

In  the  American  Report  on  Wholesale  Prices,  Wages  and 
Transportation  of  1891,  the  wages  of  some  10,000  persons  are 
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detailed.  It  is  proposed  to  consider  their  tabulation  as  a 
selection  of  limits  homogeneous  group.  The  results  are  given  on 

of  groups.  pp  5^  j0'  jn  originai  publication  the  wages 

are  given  to  half  a  cent ;  in  the  second  column,  on  p.  69,  the 
numbers  of  wage-earners  are  given  in  10-cent  groups,  from 
$.25  to  $.34,  $.35  to  $.44,  and  so  on,  those  earning  wages 
exactly  at  the  dividing  points  being  always  placed  in  the 
division  below.  Notice  that  the  average  wage  of  such  a  group 
as  $2.15  to  $2.24  is  not  $2.20  if  the  wage-earners  are  evenly 
distributed  cent  by  cent,  but  the  average  of  $2.15,  $2.16,  .  .  . 
$2.24,  i.e.,  $2,195. 

Looking  at  column  2,  we  shall  see  that  the  figures  present 
no  order,  follow  no  rule ;  no  structure  has  yet  been  found,  our 
divisions  are  too  narrow  for  our  material. 

Now  group  the  wage-earners  with  wider  limits,  as  in 
column  6,  where  the  numbers  earning  in  half-dollar  groups 
are  given ;  we  have  here  a  nearly  regular  sequence  of  numbers 
falling  after  the  maximum  in  the  second  group.  Going  back 
to  narrower  limits,  to  find  exactly  at  what  divisions  this  regu¬ 
larity  is  first  in  evidence,  we  have  in  column  4  the  numbers  in 
20-cent  groups  which  show  considerable,  but  not  absolute 
regularity.  The  numbers  in  30-cent  groups  *  are  successively 
75.  355.  674,  1242,  740,  660,  343,  310,  180,  181,  233,  32,  82, 
3,  4,  8,  1,  almost  completely  regular  except  for  the  large  group 
at  $3.25  to  $3.55. 

The  question  as  to  which  of  these  groupings  should  be 
selected  is  to  be  decided  by  the  number  of  separate  items  the 
eye  can  instantaneously  grasp.  In  looking  at  the  51  numbers 
in  the  10-cent  groups,  or  the  26  in  the  20-cent,  the  meaning  is 
lost  in  a  maze  of  figures  (though  as  many  details  as  these 
could  be  properly  shown  in  a  diagram),  but  the  11  numbers 
in  the  half-dollar  groups  are  easily  comprehended. 

Stated  in  words,  the  result  of  our  tabulation  (column  7)  is 
that  6  per  cent,  of  the  wage-earners  made  from  $.25  to  $.74, 
29  per  cent,  from  $.75  to  $1.24,  and  so  on. 

For  the  practical  work  of  the  tabulation  from  the  original 
figures,  we  should  take  ruled  sheets,  enter  at  the  head  of 
practical  tabu-  successive  columns  certain  wage  limits,  and 
htion.  turning  through  the  items  enter  each  wage  by  a 


*  Vide,  p.  97,  infra. 
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dash  in  its  appropriate  column,  grouping  them  in  fives  and 
tens,  to  facilitate  addition. 

From  the  preceding  paragraphs  it  is  clear  that  we  do  not 
need  to  take  separate  columns  for  each  cent  from  $.25  to 
$5-35  f°r  tabulation,  but  a  little  consideration  is  necessary  to 
see  how  minute  the  limits  should  be  to  give  the  correct  average. 
Suppose  the  entries  in  cent  groups  to  be  : — 


$1.70 

§1.71 

$1.72 

$i-73 

$1.74 

mu 

mu 

iiiii 

IIIII 

IIIII 

1 

iiiii 

iiiii 

IIIII 

mu 

11 

hi 

The  average  of  the  wages  so  entered  can  be  quickly  cal¬ 
culated  as  $1,718. 

If,  on  the  other  hand,  we  put  all  the  51  entries  as  simply 
“  between  $1.70  and  $1.74,  ”  or  more  exactly  “  as  much  as 
$1.70  but  less  than  $1.75,”  we  should  naturally  take  them  to 
be  all  (for  purposes  of  averaging)  at  the  middle  point  of  this 
group,  viz.,  $1.72. 

If  we  have  a  sufficient  number  of  items,  the  differences 
between  the  average  assumed  and  that  calculated  for  each 
group  will  be  very  slight.  This  is  seen  on  p.  69;  column  8 
gives  the  averages  calculated  from  the  entries  in  10-cent 
groups,  while  column  9  gives  them  on  the  hypothesis  that  for 
purposes  of  averaging  the  numbers  in  the  half-dollar  groups 
may  all  be  taken  at  the  middle  points  of  their  groups.  The 
difference  is  greatest  in  the  first  and  last,  the  smallest  groups. 
The  general  average  obtained  from  column  9  is  $1.70,  which  is 
the  nearest  round  number  to  the  true  average  $1.73.  Hence, 
for  the  purpose  of  obtaining  the  general  grouping  and  average,  we 
need  only  take  11  half-dollar  columns  for  marking  in  our  items. 

For  other  purposes  it  may  be  advisable  to  work  more 
minutely ;  for  in  the  lowest  group,  we  shall  wish  to  know  how 
many  are  earning  $.25,  $.30,  $.35  separately,  for  5  cents  is  a 
perceptible  difference  on  25  cents.  At  the  top  also  it  may  be 
useful  to  know  the  exact  wages. 

More  minute  entries  again  will  be  needed  for  the  second 
method  of  tabulation,  which  is  as  follows  : — Suppose  all  the 
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wage-earners  to  be  arranged  in  order  of  the  magnitude 
The  Gaitonic  of  their  wages,  those  at  $.25  at  one  end,  those 

method-  at  $5.75  at  the  other.  Note  the  wages  of  men  at 
given  points  in  the  row.  The  lowest  wage  is  $.25  ;  one-tenth 
of  the  way  along,  that  of  the  512th  worker  is  between  $.85  and 
$.95,  .  .  .  half-way  up  the  wage  is  $1.50.  The  figures  at  each 
tenth  are  given  on  p.  70.  By  this  means  we  get  a  very  vivid 
idea  of  the  distribution  according  to  wages. 

These  numbers  cannot  be  obtained  accurately  if  we  have 
only  entered  the  details  correct  to  half-dollars,  but  can  be 
found  from  the  10-cent  grouping,  which  is  therefore  the  classi¬ 
fication  to  be  adopted.  We  must  first  determine  in  which  of 
the  small  groups  the  men  one-tenth,  two-tenths  ...  up  the 
group  lie,  and  then  estimate  their  position  inside  the  smaller 
group.  Thus,  if  we  want  the  figure  more  accurately  than 
“  between  $.85  and  $.95,”  as  given  above,  we  proceed  as 
follows  : — The  512th  man  from  the  bottom  is  the  82nd  man  in 
the  group  between  $.85  and  $.95,  for  there  are  430  earning  less 
than  $.85;  this  group  contains  169;  if  they  were  distributed 
regularly,  17  to  each  cent,  the  82nd  man  would  be  half-way 
through  this  group,  between  $.89  and  $.90.  The  hypothesis 
of  even  distribution  is  sufficiently  correct  for  most  purposes, 
and  this  method  affords  a  sufficiently  accurate  means  of 
determining  the  wage  of  the  workers  at  the  tenth  places.  The 
resulting  figures  are  given  on  p.  70.  If,  however,  we  want  to 
know  the  wage  of  the  half-way  man  more  exactly,  we  see  from 
the  half-dollar  groups  that  it  is  between  $1.25  and  $1.75,  a 
rough  approximation  shows  it  to  lie  probably  between  $1.45 
and  $1.55,  and  then  we  rapidly  turn  through  our  original  data, 
isolating  the  wages  at  $1.46,  $1.47,  .  .  .  $1.55.* 

A  slight  modification  of  this  method  is  also  useful.  Take 
the  average  of  the  lowest  512  (or  tenth),  namely,  $-7o|;  of  the 
next,  namely,  $1.03;  and  so  on  (see  p.  70).  These  figures  also 
give  a  vivid  view,  and  are  very  convenient  for  comparisons 
with  other  groups. 

The  figures  so  far  apply  to  only  half  of  the  data  in  the 
Senate  Report.  On  p.  70  the  whole  are  tabulated  to  give 
the  average  wages  of  the  successive  tenths.  A  comparison  of 
the  two  groups  so  obtained  shows  how  far  the  first  half  was 
typical  of  the  whole. 


*  On  this  method  see  pages  102-7. 
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Tabulation  of  Wages — American  Figures,  1891. 


I. 

Earning  Daily 
Wages. 

$ 

as  much  and  less 

as 

than 

•25 

*  35a 

•35 

•45 

•45 

•55  f 

•55 

.65 

.65 

•75' 

•75 

•85^ 

•85 

•95 

•95 

i-o5  h 

1.05 

MS 

15 

1. 25; 

1.25 

1*35  \ 

x-35 

i-45 

i-45 

1-55  > 

T-55 

i-65| 

1.65 

i- 75  ^ 

*•75 

1-85^ 

1.85 

i-95 

i-95 

2.05  V 

2.05 

2.15 

2.15 

2.25; 

2.25 

2-35^ 

2-35 

2-45 

2-45 

2-55  > 

2-55 

2.65 

2.65 

2-75/ 

2-75 

2.85^ 

2.85 

2-95 

2.95 

3-°5 } 

3-05 

315 

3-i5 

3-257 

3-25 

3-35  \ 

3-35 

3-45 

3-45 

3-55  \ 

3-55 

3-65 

3-65 

3-75  / 

3-75 

3-85^ 

3-85 

3-95 

3-95 

4-05  \ 

4-05 

4-i5 

4-i5 

4- 25  J 

4-25 

4-35^ 

4-35 

4-45 

4-45 

4-55 

4-55 

4-65 

4-65 

4-75' 

4-75 

4-85d 

4.85 

4-95 

4-95 

5-05  y 

5-o5 

5- 15 

5-x5 

5-257 

5-25 

5-35 

Totals  - 

Average  Wage 

2. 


5 

47 

12 

o 

221 

5 

16 

11 

o 

82 


s} 

0} 


3- 

IS. 

4- 

No.  of 
Persons 

5- 

0. 

No.  of 
Persons 

7- 

Percent 

age. 

8. 

Average 
Wage  in 
Group. 

9. 

$ 

as  much  and  les 
as  than 

$ 

as  much  and  less 
as  than 

•25 
}  -45 

•45 

.65 

l6 

I44 

•25 

•75 

3X7 

6.2 

$  instead  $ 
.62  of  .50 

}  .65 

.  -85 

270 

}  .85 

1.05 

370 

•75 

1.25 

1,472 

2S.7 

1.09 

1. 00 

}  >.05 

1.25 

989 

}  1-25 

i-45 

557 

}  i-45 

1.65 

538 

1.25 

i-75 

1,297 

'  25.3 

1.49 

I.50 

}  1-65 

1.85 

531 

}  1-85 

A 

2.05 

331 

i-75 

2.25 

970 

18.9 

1.99 

2.00 

)  2.05 

2.25 

310 

1  2.25 

2-45 

134 

}  2.45 

2.65 

209 

2.25 

2-75 

506 

9.9 

2-53 

2.50 

]  2.65 

2.85 

^5 

|  2.85 

3-05 

144 

2-75 

3-25 

I98 

3-9 

3-04 

3.00 

}  3-05 

3-25 

52 

}  3-25 

3-45 

12 

}  3-45 

3-65 

226 

3-25 

3-75 

254 

5-o 

3-51 

3-5o 

}  3-65 

3-85 

27 

}  3-85 

4-05 

82 

3-75 

4-25 

96 

1.9 

4.  CO 

4.00 

}  4-05 

4-25 

3 

}  4-25 

4-45 

0 

\  4-45 

4.65 

4 

4-25 

4-75 

4 

0 

4.50 

4-5o 

f  4-65 

4-85 

0 

- 

l  4-85 

5-o  5 

8 

4-75 

5-25 

8 

.2 

5.00 

5.00 

1  5-05 
5-25 

5-25 

5-35 

0 

1 

At  5-35 

1 

5-35 

5-25 

5,123 

5,123 

100 

Avera 

?e  Wage  $1.70 
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Averag 

:  Wage  of 

Same  for 

10  000 
Workers. 

Lowest  tenth  - 

$.70 

•79 

Second 

9  9 

1.03 

I.  OO 

Third 

99 

1. 18 

I.24 

Fourth 

9  9 

1.28 

1.5° 

Fifth 

9  9 

1.44 

I.50 

Sixth 

5  9 

1.59 

1.88 

Seventh 

» 

1.86 

2.00 

Eighth 

9  9 

2. 14 

2.22 

Ninth 

9  9 

2-59 

2.58 

Highest 

9  9 

3-5i 

3-55 

General  Average 

1. 73i 

1.S2 

Wages  of 

"  Tenth  ”  Men  ( deciles ). 

Lowest  Wage 

• 

$•25 

/Tyh  up  Group  - 

.89 

a 

t2o^ 

J  > 

1. 12 

<U 

W-H 

iVh 

>> 

1.22 

>9 

1-39 

0<V 

"TO  th 

It 

M9 

in 

<D 

6  til 

J  t 

i-75 

tf) 

TVh 

)  » 

1.99 

£ 

Ath 

9  > 

2  36 

\Vh 

9  9 

2.98 

Highest 

Wage  - 

5-35 

The  tabulation  of  the  data  collected  for  the  Wage  Census 
of  1886  on  such  forms  as  that  on  p.  71,  illustrates  well  some  of 
the  difficulties  involved.  The  items  given  on  the  main  part 
of  the  schedule  are  of  this  kind  : — 

No.  Average  Wage. 

Spinners — Time  :  6  :  12s.  :  56 J  hours. 

Such  returns  are  not  perfectly  definite,  for  if  many  are 
employed  in  the  same  occupation  in  a  mill,  it  is  possible  that 
Tabulation  in  they  will  earn  at  different  rates.  Thus  this  entry 
the  wage  census.  0f  5  I2s.  might  arise  from  either  6  men  each 

earning  12 s.,  or  2  at  10s.,  2  at  12s.,  2  at  14s.  (average  12s.) ; 
or  4  at  12s.,  1  at  15s.,  1  at  11s. ;  or  5  at  125.  and  1  at  18s. — 12s. 
being  the  general  rate,  but  not  the  average,  in  these  last  two 
alternatives.  Since  the  purpose  of  the  wage  census  was  to 
give  a  comprehensive  account  of  wages  adapted  for  use  in  all 
investigations,  it  should  show  the  numbers  in  all  trades  and 
subdivisions  of  employment  by  age,  sex,  and  district,  the 
average  and  general  rate  of  pay  for  each  group,  and  sufficient 
details  to  show  the  distribution  about  the  average  in  each 
group,  for  a  mere  average  may  conceal  exceptionally  high  or 
exceptionally  low  wages. 

On  inquiry  at  the  Labour  Department  as  to  whether  the 
original  information  had  been  given  in  a  more  detailed  form 
than  the  line  above,  or  whether  divergencies  might  be  con¬ 
cealed,  the  author  learnt  that  the  subdivision  of  occupations 
had  been  carried  to  such  an  extent,  that  in  practice,  where 
there  was  any  great  variation  in  the  wages  of  workers  under 
one  heading,  that  heading  had  been  split  up,  so  that  each 
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group  was  separately  entered,  or  that  several  groups  were 
distinguished  under  one  heading;  and  that  when  there  was 
reason  to  believe  from  the  light  of  other  returns  that  this  had 
not  been  done,  supplementary  inquiries  were  made  on  this 
point,  so  that  the  original  data  were  detailed  enough  for  any 
requisite  fineness  of  tabulation. 

The  problem  then  was  to  tabulate  the  answers  from  the 
various  factories  in  a  district,  to  show  clearly  and  succinctly 
the  distribution  of  wages  in  each  subdivision  and  in  the 
whole.  It  can  hardly  be  said  with  confidence  that  the  method 
adopted,  of  which  a  specimen  is  given  on  p.  71,  is  entirely 
satisfactory. 

To  clear  our  ideas  let  us  suppose  that  the  details  on  which 
the  line  relating  to  throwsters  (time)  was  based  were  as 
follows  : — 


3  earning  14/ 

14  „  15/ 

6  „  15/6 

20  ,,  16/ 

10  ,,  17/6 

20  ,,  18/  * 

8  ,,  18/6 

10  „  19/  , 

10  „  20/6  i 

8  „  21/5  / 


-  “  average  minimum  rate.” 


68  within  10  per  cent,  of  the  average 
for  all,  which  is  1 7/7. 


18  earning  20/11  on  the  average. 


The  process  adopted  in  the  tabulation  may  be  supposed  to 
have  been  to  separate  from  the  whole  group  of  returns  a  small 
various  methods  group  of  old  men  or  inferior  workers  earning  far 
possible.  below  the  average,  and  enter  them  as  a  distinct 
minimum  group,  and  to  separate  a  small  group  of  the  most 
skilled  workers  and  enter  them  as  a  maximum  group.  This 
is  better  than  giving  simply  the  highest  and  lowest  of  the 
individual  wages,  for  either  of  these  may  be  due  to  excep¬ 
tional  circumstances,  and  may  be  quite  a  long  way  from  that 
paid  to  any  other  person.  The  exact  size  of  these  extreme 
groups  must  be  determined  from  inspection  of  the  returns 
themselves.  After  this  has  been  done,  the  remaining  wages 
may  not  be  grouped  close  together;  in  the  example  taken 
they  are  scattered  between  15s.  and  19s.  To  give  some  clue 
as  to  this  distribution  the  number  earning  within  10  per  cent. 


TABULATION 


73 


of  the  average  is  stated ;  this  is  probably  the  best  way  if  only 
one  column  can  be  devoted  to  it,  but  io  per  cent,  is  a  wide 
limit  to  adopt.  Another  method  would  be  to  give  the  limits 
within  which  the  wages  of  the  io  per  cent,  of  the  earners  above 
and  io  per  cent,  below  the  average  were  contained  :  in  this 
case  16s.  and  18s. 

If,  however,  not  more  than  8  columns  are  to  be  devoted  to 
each  group,  the  following  arrangement  would  give  much  more 
definite  information,  and  it  could  have  been  made  from  the 
data  in  hand,  and  would  be  well  adapted  for  all  the  purposes 
for  which  it  would  be  required. 


Number  employed  - 
Average  weekly  rate  - 
One-tenth  of  the  number  of 
received  not  more  than 
One-quarter  of  the  number  of 
received  not  more  than 
One-half  of  the  number  of 
received  not  more  than 
One-quarter  received  not  less  than 
One-tenth  . 


wage-earners 
wage-earners 
wage -earners 


109 

17/7 

15/ 

16/ 

1 8/ 
19/ 

20/6 


This  method  was  used  in  the  publications  of  the  wage- 
census  of  1906,  except  that  the  tenths  were  not  given. 

After  studying  Chapters  V  and  VI,  readers  will  naturally 
replace  the  phrases  used  above  by  the  terms  median,  quartiles 
and  deciles,  and  consider  whether  one  of  the  measures  of 
dispersion  would  not  be  more  appropriate  to  use  than  the 
details  here  suggested. 

We  are  fortunately  not  dependent  solely  on  the  tabulation 
as  given  above,  for  wages  in  industries  as  a  whole  The  general 
are  also  tabulated  for  1886  *  on  the  following  plan,  summary, 
which  is  in  a  form  most  useful  for  purposes  of  comparison 

(P-  74)- 

The  lines  giving  percentages  are  very  helpful.  We  can  at  a 
glance  compare  the  levels  of  wages  in  different  industries.  Thus 
in  the  cotton  manufacture  the  average  wage  is  2 s.  higher  than 
in  the  woollen ;  and  in  the  cotton  there  is  a  large  group  of 
highly  skilled  workers  earning  from  30s.  to  35  s.,  while  in  the 


*  More  detail  is  shown  in  the  Reports  for  1906. 


Number  and  Percentage  of  Persons  Employed  at  Various  Rates  of  Wages.* 

Table  showing  the  average  Normal  Wages  paid  to  men  in  the  undermentioned  employments,  and  the 

Number  and  Proportion  of  men  paid  at  different  rates,  at  October  1886. 
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General  Report  on  Wages  (C. — 6889  of  1893). 
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woollen  nearly  half  are  close  to  the  average,  earning  between 
20 s.  and  255.  In  the  jute  and  linen  manufactures  the  averages 
are  nearly  the  same,  but  in  the  former  a  larger  proportion  are 
below  the  155.  limit.  In  the  silk  manufacture  there  is  an 
aristocracy  as  in  the  cotton,  but  it  is  smaller  and  better  paid, 
for  12  per  cent,  earn  more  than  35s.  This  table  is  a  master¬ 
piece  of  concentration  and  clearness. 

We  will  discuss  next  the  tabulation  of  the  figures  relating 
to  changes  in  rates  of  wages  collected  by  the  Labour  De¬ 
partments.  The  following  examples  are  taken  Tabulationof 
from  the  earliest  report ;  the  form  of  the  tables  change  of 
has  been  modified  many  times  since  then,  and  a  vages  retu‘ns- 
study  of  these  alterations  can  be  usefully  followed  by  turning 
through  a  file  of  the  annual  reports.  The  details  collected 
on  the  earlier  blank  forms  show  the  occupations  and  numbers 
affected,  the  dates  from  which  the  changes  took  place,  and 
the  wages  and  hours  in  a  full  week  exclusive  of  overtime  (a 
definition  corresponding  exactly  to  that  used  for  the  wage 
census)  before  and  after  the  change. 


Extract  from  Table  showing  the  Changes  in  Rates  of  Wages  and 
Hours  of  Labour  of  Ordinary  Agricultural  Labourers  in  Various 
Districts  of  the  United  Kingdom  in  1894,  so  far  as  reported  to 
the  Board  of  Trade.* 


County  and  Union. 

Particulars  of  Changes  in 
Summer  Wages.  (1894  com¬ 
pared  with  1893.) 

Particulars  of  Changes  in 
Winter  Wages.  (1894  com¬ 
pared  with  1893.) 

No.  of  Male 
Agricultural 
Labourers, 
Farm  Servants, 
Shepherds, 
Horsekeepers, 
Horsemen, 
Teamsters, 
Carters,  in  ’91. 

Increase. 

Decrease. 

Increase. 

Decrease.  * 

Per  Week. 

Per  Week. 

Per  Week. 

Lincolnshire— 

Gainsborough  - 

•  •  • 

•  •  • 

•  •  • 

1/6 (15/to  13/6) 

2,466 

Louth 

•  •  • 

•  •  • 

•  •  • 

i/6(i3/6toi2/) 

3,932 

Spilsby  - 

•  •  • 

•  •  • 

•  •  « 

1/6(13/6  to  1 2/) 

3,288 

Norfolk — 

Aylsham  - 

•  •  • 

1/(12/ to  Ilf) 

•  •  • 

•  »• 

2,576 

Docking  - 

•  •  • 

6d.(i2/6toi2 /) 

l/ ( IO/-I  if) 

•  •  • 

2,487 

Flegg,  East  and 

West  - 

•  •  • 

1/(12/  to  11/) 

•  •  • 

1/(11/ to  10/) 

1,108 

Forehoe  - 

... 

... 

•  •  • 

1/(11/ to  10/) 

1,44s 

*  From  the  second  Annual  Report  on  Changes  of  Wages,  pp.  198-9;  a  little 
compressed. 
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Extracts  from  Table  showing  the  Changes  in  Rates  of  Wages 
of  Ordinary  Agricultural  Labourers  in  Various  Districts  of  the 
United  Kingdom  in  the  Summer  of  1895,  so  far  as  reported  to 
the  Board  of  Trade.* 


County  and  Union. 

No.  of  Male 
Agricultural 
Labourers,  Farm 
Servants, 
Shepherds, 
Horsekeepers, 
Horsemen, 
Teamsters, 
Carters,  in  1891. 

Particulars  of 
Changes  in  Sum¬ 
mer  Wages  (1895 
compared  with 
1894). 

Decreases  in 
italics. 

Weekly  Rate  of  Wages 
in  Summer. 

1894. 

1895. 

Per  Week. 

s.  d. 

s.  d. 

Durham— 

Stockton*  - 

437 

Decrease  of  6d. 

17  6 

17  O 

Teesdale 

669+ 

Advance  of  6d. 

17  6 

18  0 

(Barnard  Castle 

Rural  Dist.).* 

Oxfordshire — 

Headington  - 

1,118 

Decrease  of  is. 

12  0 

II  0 

Henley 

L587t 

Decrease  of  is. 

12  0  to 

II  0  to 

(Hambleden  Rural 

14  0 

13  0 

Dist.,  Bucks). 

Norfolk — 

Flegg,  East  &  West 

i,  108 

Decrease  of  is. 

11  0 

10  0 

Forehoe 

1,448 

Decrease  of  is. 

11  0 

10  0 

Henstead 

L5°4 

Decrease  of  is. 

11  0 

10  0 

Mitford  and  Laun- 

ditch 

3,622 

Decrease  of  is. 

11  0 

10  0 

Smallburgh  - 

2,2641 

Decrease  of  is. 

11  0 

10  0 

S  waff  ham  - 

1,942 

Decrease  of  is. 

11  0 

10  0 

Wayland 

L535 

Decrease  of  is. 

11  0 

10  0 

Labourers  with- 

Carnarvonshire— 

out  food,  ad- 

-19  O 

20  0 

Carnarvon  - 

1,124+ 

vance  of  is. 

(Gwyrfai  Rural 

A 

Labourers  with 

] 

Dist.). 

food,  advance 

-II  O 

12  0 

K 

of  IS. 

J 

*  Agricultural  labourers  in  this  district  are  hired  in  March  and  April  for  a 
year  certain,  and  the  change  noted  applies  to  the  whole  year,  and  not  to  the 
summer  only. 

t  The  number  of  agricultural  labourers,  etc.,  is  for  the  Poor  Law  Union, 
but  the  change  applies  to  the  Rural  District  only. 

I  This  number  is  partly  estimated. 


The  adjoining  tables  give  examples  of  the  way  in  which 
the  changes  in  agricultural  wages  were  tabulated  in  the  Second 

Agricultural  and  Third  Report  on  Changes  in  Rates  of  Wages 
wages  :  Change  and  Hours  of  Labour.  In  the  first  table  space  is 
wasted  by  devoting  separate  columns  to  increases 


*  From  the  third  Annual  Report  on  Changes  of  Wages,  pp.  118,  119,  121 
(typography  adapted). 
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and  decreases,  with  the  intention  of  making  the  table  distinct ; 
while  it  is  not  clear  whether  “  Winter  1894  ”  means  the  winter 
beginning  in  or  that  ending  in  that  year. 

In  the  second  table,  which  refers  to  summer  wages  only, 
the  columns  are  rearranged;  and  increases  and  decreases 
printed  in  the  same  column,  the  latter  in  italics.  In  the  Fifth 
Report  all  the  information  is  printed  in  a  clearer  way,  thus  : — 

Winter  Wages.* 


District. 

N  umber. 

Weekly  Rates. 

Increase  or  Decrease  per 
Week  in  1S97. 

Tendring 

3,113 

Jan.  ’96. 

s.  d. 

IO  O 

Jan.  ’97. 

s.  d. 

II  O 

Increase. 

s.  d, 

1  O 

Decrease. 

•  it 

The  tabulation  is  repeated  for  the  summer. 

The  weakness  in  these  agricultural  returns  is  in  the  numbers 
column.  In  the  returns  from  other  industries  the  numbers 
given  are  those  actually  affected,  but  in  this  case  The  number 
it  is  not  found  possible  to  obtain  this  number  affected, 
correctly,  and  the  number  entered  is  that  found  under  “  agri¬ 
cultural  labourers  ”  in  the  1891  census,  which  includes  the 
various  categories  as  given  in  the  above  table.  When  a  change 
of  wages  takes  place  in  a  rural  district,  we  may  perhaps  assume 
that  it  is  likely  to  be  general,  though,  if  it  was  a  reduction,  it 
might  not  be  made  by  the  better  employers ;  and  though  the 
change  will  not  take  place  in  the  same  week  throughout  the 
district,  there  is  not  likely  to  be  much  variation  in  this  respect. 
The  change  was  generally  made  at  the  time  that  winter  wages 
gave  place  to  summer,  or  summer  to  winter;  and  a  slight 
increase  or  decrease  may  take  place  by  making  the  winter 
reduction  or  the  summer  advance  later  than  usual.  On  the 
whole,  little  error  will  be  introduced  by  assuming  that  the 
change  stated  affects  all  the  adult  agricultural  labourers  in 
the  district,  and  it  is  quite  probable  that  a  proportional  change  f 
will  take  place  in  the  wages  of  horsekeepers,  shepherds,  and 
others,  though  it  may  not  in  the  case  of  boys,  or  old  men 
who  are  earning  less  than  the  district  rate.  The  question, 

*  From  the  fifth  Annual  Report  on  Changes  of  Wages,  p.  145. 

f  On  these  points  see  Mr.  Wilson  Fox's  Report  on  Wages  and  Earnings 
of  Agricultural  Labourers,  1900,  p.  50,  and  pp.  111-157. 
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“  Approximate  number  of  able-bodied  labourers  in  parish  ?  ” 
is  asked  on  the  inquiry  form,  but  as  the  answers  are  not  used, 
it  may  be  assumed  that  they  are  generally  not  given  with 
sufficient  exactness. 

The  object  of  the  whole  tabulation  is  to  show  the  change  in 
the  national  weekly  wages  bill,  but  many  details  are  lacking  for 
the  complete  calculation.  In  the  case  of  agricultural  labourers, 
we  need,  in  addition  to  these  data,  accurate  statements  of  the 
change  of  additional  earnings,  special  payments, 
and  payment  in  kinds.  In  all  cases  we  need  a 
more  complete  account  of  the  whole  wage-bill  as  well  as  the 
change.  For  agricultural  labourers  the  material  has  been 
published  by  the  Labour  Department ;  *  every  year  it  received 
returns  from  most  of  the  600  unions  as  to  wages  at  all  seasons, 
whether  there  has  been  a  change  or  not. 

The  looseness  in  the  returns  as  to  numbers  does  not  prevent 
our  calculating  the  change  in  the  county  or  country  rates,  for 

changes  in  the  numbers  in  each  district  affected  by  the 
county  rates,  change  may  be  expected  to  bear  the  same  pro¬ 
portion  to  the  numbers  given  in  the  census  returns,  as  the 
number  of  agricultural  labourers  of  the  same  class  in  the 
whole  county  or  country  does  to  the  census  number,  and  we 
are  helped  by  the  principles  of  weighted  averages  discussed 
in  the  next  chapter. 

The  calculation  for  Durham  in  the  above  table  for  the 
changes  in  summer  wages  1894-95  may  be  performed  as 
follows  : — 


Average  before 
change. 

Change. 

Proportional 
number  affected. 

Amount  of  change 
on  wage-bill. 

s.  ii. 

S.  d. 

Stockton 

17  6 

-  6(1. 

4 

-2  0 

Teesdale 

17  6 

+  6d. 

7 

+  3  6 

Total  change  in  county,  +  is.  6d. 

Proportional  number  in  county,  73. 

Effect  on  county  average,  —  id. 

73 

Here,  for  simplicity  of  calculation,  the  numbers  affected  are 
taken  to  the  nearest  ioo,  a  process  which  is  not  likely  to  affect 

*  On  these  points  see  Mr.  Wilson  Fox’s  Report  on  Wages  and  Earnings 
of  Agricultural  Labourers ,  1900,  p.  50,  and  pp.  m-157. 
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the  average  perceptibly.*  This  rough  method  is  likely  to  give 
the  result  as  accurately  as  the  original  data  make  possible.  A 
similar  process  with  suitable  modifications  can  be  applied  to 
the  changes  tabulated  for  other  industries.  The  summary  of 
such  returns  for  agriculture  for  all  counties  is  as  follows  : — 


Comparison  of  the  Net  Effect  of  the  Changes  of  Cash  Wages 
per  Week  paid  in  the  Years  1896  and  1895  in  certain  Districts 
in  England  and  Wales,  f 


District 

Wages  in  1896  as  compared 
with  1895. 

Wages  in  1895  as  compared 
with  1894. 

Total 

N  umber. 

Net  Effect  of  Changes 
on  Weekly  Wages. 
Increase  (+)  and 
Decrease  (  —  ). 

T^tal  ** 
Number. 

Net  Effect  of  Changes 
on  Weekly  Wages. 
Increase  (  +  )  and 
Decrease  (  —  ). 

Total. 

Per  Head. 

Total. 

Per  Head. 

England — 

£ 

S.  d. 

£ 

d. 

Northern  Counties  - 

5,662 

-43 

-O  if 

3,766 

4  44 

+  2f 

Yorkshire,  Lanca- 

shire,  and  Cheshire 

2,897 

+ 100 

+  0 

3,942 

-  126 

-7 1 

Eastern  and  Midland 

Counties 

69,869 

+  666 

+  0  2| 

89,5/6 

-  2,045 

—  5i 

Southern  and  Wes- 

tern  Counties 

20,901 

-340 

-O  4 

20,441 

-575 

-6f 

Wales  ... 

•  •  • 

•  •  • 

•  •  • 

2,165 

+  73 

+  8f 

Total  - 

99,329 

+  383 

+  0  I 

1 19,890 

-  2,629 

-5! 

**  The  number  given  is  the  total  of  male  agricultural  labourers,  farm  servants,  shepherds,  horse 
keepers,  in  1891,  in  the  Poor  Law  Unions  in  which  the  changes  took  place. 


*  The  corresponding  calculations  for  Oxfordshire  are  : — • 


12/  -1/ 

11 

-11  / 

13/  -1/ 

16 

-16/ 

Effect  on  county  average, 

-27/ 

161 

=  —2d. 

-27/ 

For  Norfolk  : — 

12/  -1/ 

134 

-134/ 

Effect  on  county  average, 

-134/ 

425 

=  —4  d. 

t  From  the  fourth  Annual  Report  on  Changes  of  Wages,  p.  xliv. 
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The  value  of  this  table  is  not  obvious.  It  .seems  of  little 
importance  to  know  how  many  persons  were  affected  alto- 
Critidsm  of  gether ;  though  it  is  of  some  value  to  learn  from 
summary  table.  a  previ0us  table  that  58,578  persons  received 

increases,  and  40,751  decreases  in  1896.  This  total  of  persons 
affected  is  constantly  given  in  these  tables ;  if  a  person  receives 
an  increase  of  is.  one  month,  and  loses  it  the  next,  he  is  counted 
as  2,  and  his  contribution  to  the  next  column  (net  effect  of 
change)  is  zero.  This  —  £43  may  mean  that  2000  persons 
received  a  decrease  of  is.  each,  and  the  remaining  3662  (same 
or  different  persons)  an  increase  of  3J d.  each,  or  any  other 
figures  which  would  give  the  same  total.  The  change  per 
head  in  the  next  column  is  unimportant ;  it  only  shows  an 
arithmetical  quotient  with  no  concrete  meaning  that  can  be 
expressed  in  words.  If  it  was  replaced  by  another  quotient, 

viz.,  where  n  is  the  number  of  agricultural  labourers  in 

the  Northern  Counties,  we  should  know  the  effect  on  average 
wages.  In  fact,  the  table  would  be  more  useful  thus  : — 


Approximate  Effect  of  Changes  on  National  Weekly 

Wage  Bill. 


District. 

Increases. 

Decreases. 

Net 

Change. 

Total  No. 
Employed. 

Average 

Change. 

No. 

affected. 

Total. 

No. 

affected. 

Total. 

The  figures  given  supply  an  example  of  the  common  prac¬ 
tice  of  carrying  out  into  detail  a  calculation  which  depends 
originally  on  incorrect  numbers,  in  this  case  the  number 
employed,  and  is  therefore  misleading  throughout.  Till  the 
average  (useless  here  in  any  case)  is  taken,  the  error  in  this 
quantity  has  no  injurious  effect.  As  shown  above,  the  average 
here  given  could  be  replaced  by  another  which  would  be  of 
use,  and  which  would  be  correct  within  limits  that  could  be 
defined,  and  would  be  narrow  enough  for  most  purposes. 

Further,  since  the  column  of  numbers  affected  is  admittedly 
wrong,  the  figures  should  be  given  to  the  nearest  1000  rather 
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than  to  units,  even  if  no  attempt  was  made  to  estimate  the 
new  figure;  “  between  5000  and  6000  are  affected  ”  is  a  more 
useful  and  correct  statement  than  “  5662  persons  belonged  in 
1891  to  a  class  in  some  undefined  way  connected  with  that 
in  question  in  1896.” 

Since  the  introduction  of  the  minimum  wage  in  agriculture 
the  whole  problem  has  been  modified  and  simplified.  The 
foregoing  analysis,  however,  still  illustrates  the  adaptation  of 
tabular  methods  to  difficult  and  imperfect  data,  and  shows 
how  records  of  wage-changes  in  general  were  handled  officially 
for  at  any  rate  twenty  years. 

The  discussion  of  Group  C,  the  tabulation  of  non-numerical 
answers,  must  be  postponed  till  we  have  analysed  the  nature 
and  use  of  averages. 


CHAPTER  V. 
AVERAGES. 


It  is  natural,  in  a  book  with  the  present  title,  to  allot  a 
considerable  space  to  averages.  By  the  use  of  averages 
complex  groups  and  large  numbers  are  presented  in  a  few 
significant  words  or  figures ;  and  thus  the  two  definitions  of 
statistics,  the  Science  of  Averages  and  the  Science  of  Large 
Numbers ,  are  reconciled. 

Some  writers  have  attempted  to  draw  a  distinction  between 
averages  and  means,  but  no  general  agreement  has  been  reached 
Averages  and  as  to  the  exact  senses  in  which  the  words  are 

means.  ^o  be  separately  applied.*  The  best  distinction 
may  be  made  by  deciding  that  an  average  is  a  purely  arith¬ 
metical  conception,  such  as  the  average  length  of  life  in  a 
varied  population,  which  does  not  correspond  to  any  particular 
group,  but  is  only  a  short  way  of  expressing  an  arithmetical 
result;  while  the  word  "  mean  ”  is  to  be  applied  to  some 
objective  quantity,  such  as  the  mean  height  of  Englishmen, 
about  which  all  height-measurements  are  grouped  in  a  definite 
way.  If  this  terminology  is  adopted,  most  of  the  discussion 
under  A,  B,  C  in  the  sequel  applies  to  “  averages  ”  and  under 
D,  E  and  F  to  “  means.” 

A.  Arithmetic  Averages. — We  may  rapidly  pass  by 
some  of  the  common  uses  of  the  word  “  average,”  and  pick 
out  those  which  will  prove  of  use  in  statistics.  An  average  is 
sometimes  used  merely  to  avoid  big  numbers.  The  average 
weight  of  the  University  crew  is  given,  only  because  it  is  more 
usual  to  speak  of  a  man’s  weight  being  12J  stone  than  of  eight 
men’s  weight  being  I2f  cwt.,  and  it  is  easier  to  connect  the 
former  with  men’s  weight  in  general.  Similarly,  if  we  are 
comparing  the  value  of  the  exportations  of  some  commodity 

*  Compare  the  article  “  Moyenne,"  by  Dr.  Bertillon,  in  Dictionnaire 
encyclopedique  des  Sciences  Medicales,  with  this  chapter.  See  also  the 
paper  by  Dr.  Venn  in  the  Statistical  Journal ,  1891,  and  chap,  xviii.  in  his 
Logic  of  Chance. 
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in  two  periods  of  ten  years  each,  we  should  say  that  the  yearly 
average  in  the  period  1870-79  was  £10,000,000,  and  1880-89 
was  £11,000,000,  rather  than  that  the  totals  were  £100,000,000 
and  £110,000,000.  This  leads  to  the  second  ordinary  use  of 
the  word.  If  we  were  comparing  the  ten  years  The  common 
1870-79  with  the  eleven  years  1880-90,  and  the  denominator- 
totals  in  the  periods  were  £100,000,000  and  £132,000,000 
respectively,  we  should  obtain  no  grasp  of  the  difference  till 
we  had  reduced  them  to  a  common  denominator  by  dividing 
by  the  number  of  years,  and  found  that  the  averages  in  the 
two  periods  were  £10,000,000  and  £12,000,000.  This  class  of 
averages  is  well  known  in  cricket ;  sometimes  the  total  number 
of  runs  made  or  wickets  taken  by  each  cricketer  are  stated 
also,  but  these  are  rather  as  so-called  statistical  curiosities 
than  as  having  much  bearing  on  the  skill  or  luck  of  the  players. 
The  numbers  by  which  the  seasons’  performances  are  judged 
are  the  quotients  of  the  number  of  runs  by  the  number  of 
innings,  of  the  number  of  wickets  by  the  number  of  runs,  and 
so  on,  all  quantities  being  reduced  to  a  common  denominator. 
The  average  in  this  sense  is  very  common  in  mechanics.  The 
average  pressure  per  square  inch,  the  average  work  done  by 
an  engine  per  minute,  the  average  speed  of  a  train,  are  quan¬ 
tities  which  it  is  frequently  necessary  to  use.  Such  an 
expression  as  the  average  rate  of  interest  is  precisely  similar. 

It  will  be  clear  that  percentage  is  a  special  case  of  this 
use  of  average.  It  is  useless  when  comparing  the  growths 
of  population  or  of  trade  to  give  only  the  whole  Averages  as 
numbers.  An  increase  of  50,000  in  the  population  rates- 
of  London  is  not  so  significant  as  one  of  10,000  in  that  of 
Harrow ;  they  must  be  expressed  as  increases  of  1  per  cent, 
and  60  per  cent.,  say,  before  their  meaning  can  be  appreciated, 
and  this  is  the  same  thing  as  giving  the  average  increase  to 
100  inhabitants.  For  this  reason  the  records  of  births,  deaths, 
and  marriages  are  always  given  as  rates — so  many  per  1000 
inhabitants ;  and  in  these  cases  a  double  average  is  given,  for 
the  rates  signify  so  many  per  1000  inhabitants  per  annum. 

Another  extension  of  the  same  use  is  found  when  quan¬ 
tities  are  reduced  to  rates  "  per  head  ”  of  the  population.  This 
use  is  solely  for  comparison,  and  the  principle  employed  is 
that  of  the  common  denominator.  It  would  be  futile  to  state 

that  the  amount  spent  on  drink  was,  say,  £100,000,000  in 

g  2* 
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i860  and  £110,000,000  in  1890;  but  the  corresponding  state¬ 
ments  that  the  amounts  were  £3  10s.  per  head  in  i860  and 
£2  15s.  per  head  in  1890  would  make  a  comparison  possible. 
In  preparing  any  comparative  summary  of  figures,  it  is  always 
necessary  to  consider  whether  such  an  average  should  be  taken, 
preliminary  So  far,  the  averages  considered  are  simply 
definition.  arithmetical,  and  satisfy  the  following  definition  : — 


Average  X  number  to  which  it  applies=total  quantity  dealt  with, 
e.g.  Average  weight  X  number  of  crew=Total  weight  of  crew. 

The  following  question,  however,  will  lead  us  further.  The 
its  inappiica-  average  weekly  agricultural  wages  in  1892  in 
biiity.  Wilts,  Dorset,  Devon,  Cornwall,  and  Somerset 
were  10s.,  10s.,  13s.  6 d.t  14s.,  11s.  respectively.  What  was 
the  average  in  the  south-west  of  England? 

The  simplest  method  is  to  say,  the  average  was 


10s.  T*  10s.  -j-  13s*  6^? .  — 1 —  1 4^ *  — 1 —  iis. 

1 


58s.  6d. 
5 


ns.  8*4^. 


and  for  many  purposes  this  would  be  sufficient ;  but  it  does 
not  satisfy  the  above  definition.  For  when  we  ask  the  double 
question  “  11s.  8-4^.  multiplied  by  what  number  equals  what 
total?  ”,  we  can  only  answer  that  11s.  8*4 d.  multiplied  by  the 
number  of  items  equals  the  sum  of  items. 

We  must  consider  further  what  we  understand  by  the 
expressions  “  average  wage  in  each  county,”  and  “  average 
wage  in  the  group  of  five  counties.” 

It  may  be  supposed  that  the  average  wage  in  Wilts,  for 
instance,  was  compiled  by  getting  returns  from  different 
villages,  say  12s.,  11s.,  9s.,  9s.  6 d.,  10s.  6 d.,  9 s.,  9s.,  adding 
them  and  dividing  by  the  number  of  villages.  This  of  course 
satisfies  our  definition  no  better  than  the  former.  What  is 
to  be  understood  by  the  average  in  each  village?  If  our 
present  definition  is  to  be  satisfied,  it  should  be  the  total  of 
the  wages  paid  in  the  village  divided  by  the  number  of  workers. 
It  is  hardly  necessary  to  say  that  this  total  is  never  found  in 
such  an  investigation,  and  the  average  is  given  from  observation 
or  by  guess-work,  not  by  calculation. 

If,  however,  the  village  average  was  correct,  and  we  had 
returns  from  all  the  villages  in  the  county,  we  should  find 
the  county  average  as  follows  : — 

I2/X  200+Il/X  I50  +  9/X300+9/6X  I50+I0/6X  400+9/  X2Q0  +  9/X200_ 

200  +  150  +  300+150+400+200  +  200  9/11 
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where  the  numbers  in  the  denominator  are  the  numbers  of 
labourers  in  the  respective  villages.  We  should  then  have  the 
same  result  as  if  we  had  had  the  wages  of  all  the  labourers 
in  the  county  put  down  on  a  sheet,  added  up,  and  divided  by 
their  number,  and  the  average  would  satisfy  the  definition. 

It  is  clear  that  we  can  simplify  this  arithmetical  work, 
for  if  we  divide  throughout  by  50  we  get  the  same  result ; 
this  is  as  if  we  said  there  were  4,  3,  6  .  .  .  labourers  in  the 
villages  instead  of  200,  150,  .  .  .  Thus  we  get  the  same 
result  if  we  take  numbers  proportional  to  the  total  numbers 
of  the  labourers  instead  of  the  actual  numbers.  This  plan 
has  two  advantages  :  first,  that  though  we  do  not  know  the 
numbers  of  labourers,  we  know  numbers  nearly  proportional 
to  them,  viz.,  those  included  in  the  census  returns  under  the 
general  headings  relating  to  agriculture;  and  secondly,  we 
need  not  choose  our  numbers  with  absolute  exactness ;  thus 
the  numbers  of  labourers  above  given  may  be  supposed  to  be 
round  numbers  substituted  for  213,  145,  320  .  .  . ;  and  it  will 
presently  be  seen  that  such  differences  hardly  affect  the 
average.  We  idealise  the  village,  and  suppose  it  to  contain 
round  numbers ;  and  then  for  the  numerical  work  take  simple 
numbers  proportional  to  these.  This  is  important  as  simplifying 
.  numerical  work. 

Averages  obtained  for  the  county  in  this  way  do  not  abso¬ 
lutely  satisfy  our  definition,  but  are  very  nearly  equal  to 
those  that  do.  We  can  then  proceed  to  take  the  average  for 
the  south-west  of  England  on  the  same  principles. 

A  common  case  is  when  the  data  are  given  as  so  many 
instances  in  successive  grades,  as  in  columns  1  and  3  in  the 
following  table.  To  obtain  the  arithmetic  average  it  is  neces¬ 
sary  to  make  some  assumption  as  to  the  distribution  of  the 
instances  within  the  grade.  It  can  be  shown 

,  ...  •  11  i  ii  Graded  data. 

that  m  ordinary  cases,  especially  where  the 
numbers  tail  off  rapidly  at  both  extremities,  a  high  degree  of 
accuracy  is  obtained  by  setting  out  the  work  as  if  the  numbers 
in  each  grade  were  concentrated  at  the  middle  point  of  that 
grade;  in  fact  the  average  in  each  grade  is  generally  nearer 
the  centre  of  the  group  than  is  the  middle  point  of  that  grade, 
but  the  resulting  errors  on  either  side  of  the  centre  tend  to 
neutralise  each  other.  The  work  is  generally  simplified  by 
taking  the  breadth  of  the  grade  (five  years  in  the  table)  as 
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unit,  and  measuring  from  an  origin  selected  at  the  middle 
point  of  a  grade  in  which  the  entries  are  numerous  (grade 
40-45  years,  middle  point  42 J  in  the  table).  The  average 
distance  from  this  origin  (obtained  by  dividing  the  total  of 
column  4  by  the  number  of  cases)  shows  the  distance  of  the 
average  from  the  origin  in  terms  of  the  unit,  whence  the 
average  is  readily  calculated  on  the  original  scale. 


Ages  of  Married  Men  in  England  and  Wales,  1911. 


Grade 

Years. 

Middle  Points  of 
Grade  measured 

Number  of 

Product  of 

Cumulation. 

from  Origin, 
42^  Years ; 
Unit,  5  Years. 

Married  Men 
per  1000. 

Numbers  in 
Cols.  2  and  3. 

Limiting 

Age. 

Number  above 
Age  in  Col.  5. 

Col.  I. 

Col.  2. 

Col.  3. 

Col.  4. 

Col.  5. 

Col.  6. 

15-20 

-5 

0 

O 

15 

1,000 

20-25 

-4 

33 

-132 

20 

1,000 

25-30 

—  3 

112 

-336 

25 

967 

30-35 

—  2 

152 

-304 

30 

855 

35-4° 

—  1 

154 

-154 

35 

703 

40-45 

0 

136 

4° 

549 

45-50 

1 

118 

+  Il8 

45 

4J3 

50-55 

2 

96 

192 

50 

295 

55-60 

3 

74 

222 

55 

199 

60-65 

4 

54 

216 

60 

125 

65-70 

5 

37 

185 

65 

7i 

70-75 

6 

21 

126 

70 

34 

75-80 

7 

9 

63 

75 

13 

80-85 

8 

3 

24 

80 

4 

85-90 

9 

1 

9 

85 

1 

90-95 

10 

0 

1,000 

0 

+  i,i55 
—  926 

229 

90 

O 

Average  :  42  \  of  5  =  43-645  years. 

Diagrams  illustrating  this  table  are  given  facing  p.  130. 

B.  Weighted  Averages. — This  discussion  introduces  and 
gives  an  example  of  the  very  important  statistical  method 
known  as  “  weighting  the  average.”  We  may  illustrate  it 
further  from  the  same  figures  by  considering  what  weights  to 
apply  to  get  this  average  for  South-West  England.  We  may 
find  the  number  of  agricultural  labourers  in  the  counties 
and  work  out  the  average  thus  :  IOS‘  XJ!°A)00  .+T°E.X  3°,0°o  + _ . 

0  20,000  4~  30,000  -f-  ’ 

or  we  may  argue  that  since  we  have  no  means  of  knowing  the 
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exact  numbers  of  labourers  we  may  as  well  arrange  the  weights, 
according  to  the  importance  of  the  counties,  say  20,000, 
30,000,  etc.,  from  some  other  point  of  view,  and  take  numbers 
representing  such  quantities  as  the  amounts  of  wheat  pro¬ 
duced,  the  area,  or  the  rate  of  increase  of  population.  In 
this  particular  case  these  methods  would  be  absurd,  but  in 
other  problems  the  weights  are  not  so  obvious.  Suppose,  for 
example,  that  we  are  considering  the  attraction  of  London  on 
the  inhabitants  of  various  counties ;  that  we  are  told  that  so 
many  immigrants  arrive  from  Essex,  Norfolk,  and  Suffolk, 
and  so  many  from  Stafford  and  Worcester,  and  we  are  asked 
to  compare  the  attractive  power  on  the  agricultural  and  manu¬ 
facturing  counties.  Should  we  weight  the  numbers  given  by 
the  total  numbers  of  inhabitants  of  the  contributing  counties, 
or  by  their  distance  from  London,  or  by  some  quantity  derived 
from  these  ? 

The  idea  is  made  clearer  by  the  mechanical  analogy  in 
which  the  word  weight  originated.  Suppose  a  uniform  weight¬ 
less  rigid  rod  graduated  in  100  equal  divisions,  Mechanical 
and  equal  weights  hung  at  the  40th,  50th,  60th,  lustration. 
70th,  and  80th  divisions  from  one  end;  the  rod  will  then 
balance  at  a  point  corresponding  to  the  unweighted  average, 
60  intervals  from  the  same  end.  Now,  suppose  the  equal 
weights  replaced  by  weights  of  7,  1,  3,  2,  4  lbs.  respectively, 
and  the  rod  will  balance  at  a  point  corresponding  to  the  weighted 
average,  57-1  intervals  from  the  same  end.  The  further  any 
particular  mass  is  moved,  or  the  heavier  it  is,  the  more  the 
centre  of  gravity  will  be  shifted ;  and  this  clearly  corresponds 
to  the  influence  we  should  wish  the  various  wages  to  have  in 
the  statistical  problem.  The  formula  in  use  in  Statics  x  = 

— which  corresponds  to  the  arithmetic  on  the  previous 

page,  can  also  be  used  in  Statistics. 

The  discussion  of  the  proper  weights  to  be  used  in  this  and 
other  averages  has  occupied  a  space  in  statistical  literature  out 
of  all  proportion  to  its  significance,  for  it  may  be  said  at  once 
that  no  great  importance  need  be  attached  to  the  special 
choice  of  weights ;  one  of  the  most  convenient  facts  of  statis¬ 
tical  theory  is  that,  given  certain  conditions,  the  The  small  effect 
same  result  is  obtained  with  sufficient  closeness  of  weights, 
whatever  logical  system  of  weights  is  applied.  We  must 
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postpone  the  complete  mathematical  analysis  of  this  proposi¬ 
tion,  but  may  offer  immediately  some  algebraic  formulae  and 
arithmetical  illustrations. 


Write  Wx,  W2  .  .  .  W„  for  the  weights  applied  to  n  quantities  Mx, 
M2  .  .  .  M„. 

.  .  ,  ,  ,  ,  -  WXMX  — "W 0IVT2  ~ h  •  •  • 

Then  the  weighted  average,  Mw,  = — -  w  - - . 

Let  m  be  the  average  of  the  M’s,  and  let  M1=m  +  w1,  M 2=m  +  m2  .... 
Then  nm  —  Mx  +  M2  +  •  •  •  =  n m  +  m2  +  .  .  .  ,  so  that  +  m2 
+  .  .  .  =0. 

Similarly  if  w  is  the  average  of  the  w’ s,  and  W  etc.,  wz£)=W1+W2 

+  .  .  .  ,  and  o'1+^2+  •  •  •  =°- 

Then  (Wi+WjE  .  .  .  )MTO  =  (w+wx)  (m  +  wq)  +  (w-j-w2)  (m-\-m2)  -f-  .  .  .  ; 
nw .  yLw=nwniJt-m(w1-\-w2-\-  .  .  .) +w(m1-fm2+  .  .  .) 

+  .  .  .  ; 

-  _  _  If  Wi  .  w  2  ,  1 

.  .  Mw  —  m=-{  ...  1. 

nyw  1  w  J 

The  difference  between  the  weighted  average  (Mw)  and  the  unweighted 


average  m  depends  therefore  on  the  average  of  terms  such  as  .  mx. 

The  sum  of  the  w’s  and  the  sum  of  the  m’s  is  zero,  and  of  the  w’s  and  of  the 
m’s  many  are  negative  and  many  positive.  It  is  only  when  like  signs  are 
more  commonly  found  in  a  pair  ot  m  and  w  than  are  unlike  signs  that  the 
whole  expression  for  the  difference  between  M7t/  and  m  becomes  at  all 
important. 

In  the  following  table  from  the  Wage  Census  (see  page  facing),  m  is 
24s.  2 d.,  Mw=2 45.  yd.,  n  —  38,  w  —  9600,  and  writing  the  weights  to  the 
nearest  100  and  the  wages  in  pence  we  have  the  following  values,  the  trades 
being  taken  in  the  order  of  the  table  : — ■ 


w. 

m. 

wm. 

w. 

m. 

wm. 

+  226 

+  13 

+  2,938 

+  431 

+ 

41 

+  : 

[7,671 

+ 

26 

—  12 

-  312 

+ 

147 

— 

41 

— 

6,027 

— 

26 

—  IO 

+  260 

+ 

184 

+ 

36 

+ 

6,624 

— 

28 

-53 

+  L484 

— 

44 

+ 

7 

— 

308 

— 

68 

-58 

+  3,944 

— 

34 

+ 

4 

— 

136 

— 

84 

-  8 

+  672 

+  321 

+ 

19 

+ 

6,099 

— 

74 

-23 

+  1,702 

+ 

11 

+ 

61 

+ 

671 

— 

83 

+  29 

-2,4  07 

+ 

19 

+  ] 

til 

+ 

2,109 

— 

85 

+  3 

-  255 

— 

75 

+ 

1 

— 

75 

— 

90 

+  37 

-3,330 

— 

78 

+ 

65 

— 

5,070 

— 

69 

-48 

+3,312 

— 

9i 

+ 

50 

— 

4,550 

— 

93 

—  36 

+3,348 

— 

93 

+ 

75 

— 

6,975 

+578 

-15 

—  8,670 

— 

79 

+ 

28 

— 

2,212 

— 

46 

—  90 

+4,140 

— 

67 

+ 

1 

— 

67 

— 

66 

+  10 

—  660 

— 

12 

+ 

1 

— 

12 

— 

27 

-25 

+  675 

— 

78 

— 

46 

+ 

3,588 

— 

73 

-27 

+  1,971 

— 

64 

— 

16 

+ 

1,024 

— 

56 

—  4 

+  224 

— 

85 

— 

14 

+ 

1,190 

— 

9i 

-66 

+  6,006 

— 

74 

+ 

12 

— 

888 

Sum  of  21  positive  products +  69,652. 

Sum  of  17  negative  products— 41,954. 

Sum  of  the  38  products  =  17,698  =K/1m1-f  .  .  . 

Mw=m+^of  17,698  =  245.  2d.  +  ^^^.=245.  6'8d.  =245.  yd.  to  nearest 
penny,  as  in  the  table. 


The  table  on  the  next  page  affords  an  example  of  this 
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Examples  of  the  Smallness  of  the  Change  Introduced  by 
Difference  in  Systems  of  Weighting. 


From  the  Wage  Census,  1886. 

Numbers 

Employed 

Arbitrary 
System  of 
Weights. 

Trade. 

Average 

Wages 

(Men). 

Number 
Included 
in  Returns 

in  Trade 
when 
known. 
Unit  1,000 

Equal 

Weights. 

Cotton  Manufacture 

5. 

25 

d. 

3 

32,lS9 

12,248 

I42 

I44 

I 

Woollen  n  - 

- 

23 

2 

54 

172 

I 

Worsted  and  Stuff  Manufacture 

- 

23 

4 

7,005 

38 

219 

I 

Linen  Manufacture  - 

- 

19 

9 

6,807 

22 

96 

I 

Jute  //  - 

- 

19 

4 

2,799 

9 

23 

I 

Hemp,  See.,  n 

- 

23 

6 

1,232 

3 

78 

I 

Silk  rt  - 

22 

3 

2,248 

10 

189 

I 

Carpet  n  - 

- 

26 

7 

1,292 

0 

213 

I 

Hosiery  n  - 

- 

24 

5 

1,070 

S 

287 

I 

Lace  n  - 

- 

27 

3 

593 

8 

51 

I 

Small  wares  n  - 

- 

20 

2 

2,734 

0 

225 

I 

Flock  and  Shoddy  Manufacture 

- 

21 

2 

33° 

2 

2CO 

I 

Coal,  Iron  Ore,  and  Ironstone 
Mines  ----- 

22 

1 1 

67,429 

57 

I42 

I 

Metalliferous  Mines 

- 

l6 

6 

5,046 

0 

190 

I 

Shale  Mines  and  Paraffin  Oil  Works 

25 

0 

3,021 

0 

207 

I 

Slate  Mines  and  Quarries 

- 

22 

1 

6,933 

\ 

232 

I 

Granite  Quarries  and  Works 

- 

21 

1 1 

2,315 

1  12 

206 

I 

Stone  Quarries  -  -  - 

- 

23 

10 

3,956 

J 

34 

I 

China,  Clay,  &c. ,  Works 

- 

18 

8 

499 

0 

3) 

I 

Police  ----- 

- 

27 

7 

52,682 

58 

224 

I 

Roads,  Pavements,  and  Sewers 

- 

20 

9 

24,276 

0 

29 

I 

Gasworks  ...  - 

- 

27 

2 

27,965 

0 

40 

I 

Waterworks  ... 

- 

24 

9 

5,187 

0 

151 

I 

Pig  Iron  (Blast  Furnaces) 

- 

24 

6 

6,234 

0 

128 

I 

General  Engineering  Iron  and 
Brass  Foundries  and  Machinery 
Trades  ----- 

25 

9 

41,658 

200 

173 

I 

Shipbuilding,  Iron  and  Steel 

29 

3 

10,661 

80 

228 

I 

Tinplate  Works  -  -  - 

33 

5 

ii,5I4 

0 

178 

I 

Saw  Mills  -  -  -  - 

24 

3 

2,088 

0 

174 

I 

Brass  Works  and  Metal  Wares 

29 

7 

1,838 

0 

222 

I 

Shipbuilding,  Wood 

28 

4 

454 

0 

79 

I 

Cooperage  Works 

30 

5 

327 

0 

165 

I 

Coach  and  Carriage  Building 

26 

6 

1,664 

0 

28 

I 

Boot  and  Shoe  Making 

24 

3 

2,902 

0 

142 

I 

Breweries  ...  - 

24 

3 

8,366 

0 

46 

I 

Distilleries  -  -  -  - 

20 

4 

L795 

0 

129 

I 

Brick  and  Tile,  &c.,  Making 

22 

10 

3,188 

0 

55 

I 

Chemical  Manure  Works 

23 

0 

1,054 

0 

210 

I 

Railway  Carriage  and  Wagon 
Building  .... 

25 

2 

2,239 

0 

233 

I 

Averages 

- 

•  •  • 

s.  d. 

24  7 

s.  d. 

25  3 

s.  d. 

24  5i 

5-.  d. 

24  2 
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principle,*  and  is  worth  careful  study.  At  the  commencement 
Example  from  of  the  Wage  Census,  circulars  were  sent  to  all  the 
the  wage  census.  principal  firms  in  all  well-located  trades,  asking 

for  details  as  to  wages.  Of  these  some  were  not  returned, 
and  the  numbers  allotted  in  the  Final  Report  to  each  trade 
are  not  the  numbers  which  actually  belong  to  the  trade  in  the 
whole  country,  but  the  numbers  of  those  in  the  firms  which 
made  returns.  The  average  wage  given  is  not  therefore  the 
arithmetic  average  for  these  trades  for  the  whole  country 
corresponding  to  the  definition  given  above  for  average,  but 
the  average  of  the  average  wages  as  returned  in  each  trade 
weighted  by  the  numbers  for  whom  returns  were  made;  so 
that  the  average  wage  given  for  the  whole  group  of  trades 
might  have  proved  to  be  different,  if  with  the  same  average  in 
each  trade  the  returns  had  been  complete.  It  is  very  unlikely, 
however,  that  there  would  have  been  any  great  difference. 
In  the  table  several  systems  of  weighting  are  used;  the  first 
are  the  numbers  in  these  returns,  giving  an  average,  24s.  yd. ; 
the  second  are  the  numbers  belonging  to  each  trade  according 
to  the  census  when  they  are  above  a  certain  minimum,  giving 
an  average  25s.  3 d. ;  the  third  is  a  purely  arbitrary  list  of 
figures  taken  from  a  source  which  has  no  connection  with 
wages,  and  the  average  is  24s.  5 ;  the  last  is  the  unweighted 
average,  that  is,  all  the  weights  are  equal,  and  the  average  is 
now  24s.  2 d.  These  averages  are  close  together,  while  the 
original  items  vary  from  16s.  6 d.  to  30s.  $d.  It  is  to  be  noticed 
that  the  true  weights  are  not  known  in  this  case,  but  that 
owing  to  this  principle  we  are  able  to  dispense  with  them 
entirely. 

The  problem  dealt  with  in  the  next  table  is  to  find  the 
average  weekly  agricultural  wage  in  England  and  Wales  from 
the  returns  for  Michaelmas  1869  and  Lady  Day 
ave'r a g e^un d e r  *870,  given  in  columns  i  and  2.  There  are  very 
many  systems  many  different  ways  of  taking  this  average,  some 

of  weights.  ,  ,  .  ,  °  ,  ,,  ° 

ot  which  are  as  follows  : — lake  the  average  of 
summer  and  autumn  for  each  county,  as  in  column  3,  and  then 
the  unweighted  average  of  these  45  numbers ;  this  is  12s.  yd. 
Suppose  the  summer  wage  to  be  paid  twice  as  long  as  the 
autumn  wage,  as  in  column  4,  and  proceed  as  before;  the 


*  From  the  Statistical  Journal ,  December  1897,  with  corrections. 
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average  is  125.  5J d.y  the  slight  difference  being  due  to  the 
inclusion  of  harvest  payments  in  the  Michaelmas  wage,  which 
makes  them  higher  on  the  whole  than  the  summer  wages. 
Again,  divide  the  counties  into  geographical  groups,  take  the 
simple  average  for  each  group  (the  figures  marked  a  in  column  3 
and  b  in  column  4)  and  weight  these  by  the  figures  marked  c 
in  column  5,  the  numbers  of  agricultural  labourers  in  each 
group ;  the  average  of  the  a  figures  with  the  c  weights  is  12s.  5 d., 
of  the  b  figures  with  the  c  weights  is  12s.  4 d.  Again,  weight 
the  figures  for  each  county  in  column  4  with  the  numbers  in 
column  5,  the  most  obvious  method  of  all ;  the  average  is  then 
12s.  4 d.  Again,  take  the  simple  average  of  the  district  averages 
a  and  b,  that  is,  give  each  of  the  eight  districts  equal  weights ; 
the  averages  are  12s.  4! d.  and  12s.  3 \d.  Or  take  the  simple 
average  of  column  3,  counting  Yorkshire  and  Wales  each  as 
one  county;  it  is  12s.  8d. 

To  obtain  new  groups,  take  as  weights  not  the  number  of 
agricultural  labourers,  but  the  total  population  of  the  districts, 
the  numbers  marked  d.  Exclude  the  population  of  London  as 
exerting  a  preponderating  influence  unconnected  with  agri¬ 
culture.  A  new  factor  is  now  introduced,  for  population  is 
greatest  in  the  manufacturing  districts,  where  agricultural 
labour  is  of  comparatively  little  importance,  but  receives  high 
wages ;  these  high  wages  have  undue  weight,  and  the  average 
of  the  figures  b  with  weights  d  is  brought  up  to  13s.  if d.  If 
column  4  is  rewritten  correct  only  to  the  nearest  is.,  and 
column  5  to  the  nearest  10,000,  the  weighted  average  is  12s.  5^. 
If  column  3  is  weighted  with  random  numbers  quite  uncon¬ 
nected  with  the  problem,  viz.,  the  successive  digits  in  the  third 
decimal  places  of  the  logarithms  of  the  numbers  2  to  46,  the 
average  is  12s.  iof^.  The  reader  may  try  any  other  system 
of  logical  or  absurd  weights,  and  he  will  find  that  unless  there 
is  some  bias  in  the  selection  of  weights,  or  great  preponderance 
is  given  to  a  few  counties,  that  the  average  will  be  little  affected. 

Since  the  true  system  of  weights  which  would  reduce  the 
general  average  to  our  definition  must  be  allied  to  some  of 
those  here  adopted,  and  can  hardly  show  greater  divergence 
from  12s.  4 d.  than  these  do,  we  may  feel  confident  that  the 
true  average  is  within,  say,  3 d.  of  this  figure.  The  original 
items  varied  from  8s.  6d.  to  19s. ;  the  averages,  even  those 
based  on  the  most  extravagant  methods,  are  contained  by  the 


Agricultural  Wages  in  1S69-70.  To  Illustrate  Various  Methods  of  Weighting,  and  their  Results . 
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limits  12s.  and  13,9.  if d.  Without  some  such  argument  as 
this  we  should  have  no  clue  to  the  magnitude  of  the  error 
introduced  by  erroneous  weights.  It  is  never  safe,  however, 
to  assume  that  weights  can  be  neglected,  and  an  unweighted 
average  used,  without  first  examining  the  group  in  question, 

trying  various  systems,  and  seeing  that  the  weights  cannot 
resulting  average  is  stable.  This  will  only  be  in  general  be 

the  case  if  there  is  no  connection  between  the  size  oXlhferrorfin 
of  the  quantities  and  the  true  magnitude  of  the  their  estimation. 

weights.  Thus  if  we  are  dealing  with  wages  in  towns,  and  are 
calculating  the  average  for  all  towns  taken  together,  we  shall 
obtain  too  small  a  result  if  we  ignore  weights  and  count  all 
towns  as  equal,  for  the  higher  wages  are  paid  in  the  larger 
towns.  Thus,  as  on  pp.  118-9  below,  the  average  of  the 
recognised  wages  of  117  branches  of  the  Amalgamated  Society 
of  Engineers  was  32s.  4 d.  in  1891  if  we  count  all  the  branches 
as  equal ;  but  was  up  to  33s.  A^d.  if  we  weight  the  wage  at  each 
of  the  branches  with  the  number  of  members  belonging  to  it. 
But,  though  we  cannot  neglect  weights  entirely  in  such  cases, 
we  need  to  make  only  a  very  rough  estimation  for  them  if 
there  is  no  preponderating  influence  exerted  by  a  small  minority 
of  places.  In  this  case  London,  with  a  wage  higher  than  any 
other  district,  except  Dartford  and  Enfield  Lock,  and  with 
nearly  one-sixth  of  the  total  number  of  members  dealt  with, 
exerts  such  an  influence.  If,  giving  London  its  due  impor¬ 
tance,  we  take  as  weights  the  numbers  belonging  to  the  branches 
to  the  nearest  hundred,  we  obtain  the  average  33s.  6 d.,  prac¬ 
tically  the  same  as  before.  Each  group  for  which  an  average 
is  to  be  calculated  must  be  treated  on  its  merits;  in  many 
cases  the  weights  may  be  neglected  entirely;  in  nearly  all 
cases,  where  the  group  consists  of  many  items,  even  moderately 
large  errors  in  computing  weights  may  be  neglected.  Exami¬ 
nation  of  the  data  will  generally  determine  the  importance  of 
such  errors. 

This  principle  is  of  great  importance.  In  many  cases  the 
true  weights  are  incalculable  or  even  undefinable ;  but  now  it 
is  seen  that,  given  certain  conditions,  there  is  no  need  to  cal¬ 
culate  or  define  the  weights ;  in  many  other  cases  the  weights 
cannot  be  known  exactly,  but  exactness  is  not  necessary.  No 
system  of  weights,  however,  can  remove  an  original  bias 
common  to  all  the  items.  If,  for  example,  wages  throughout 
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were  is.  less  than  here  reckoned,  the  calculated  average  would 
be  is.  too  high.  So  we  arrive  at  a  very  important  precept  : 
in  calculating  averages  give  all  care  to  making  the  items  free 
from  bias,  and  do  not  strain  after  exactness  in  weighting- 

C.  Statistical  Coefficients. — A  statistical  coefficient  is  a 
number,  whole  or  fractional,  by  which  a  total  (e.g.,  population) 
must  be  multiplied  to  give  an  allied  number  (e.g.,  number  of 
births).  Thus  if  the  birth-rate  is  28  per  1000,  the  coefficient 
is  *028.  These  coefficients  play  an  important  part  in  ordinary 
statistics  and  a  very  interesting  role  in  the  application  of  the 
law  of  error  to  demography.  The  population  may  increase 
or  diminish,  but  the  coefficients  relating  to  certain  numbers 
fluctuate  within  narrow  limits  and  only  after  a  considerable 
period  show  any  significant  change  in  normal  times;  and  by 
their  use  the  statistics  of  different  countries  can  be  compared, 
and  numbers  for  future  years  can  be  forecasted  in  some  cases 
with  marvellous  accuracy,  subject  only  to  the  chance  of  some 
great  catastrophe.  Coefficients  can  be  formed  for  births  (in 
various  districts),  for  deaths  (according  to  age,  profession,  or 
disease),  for  marriages  (at  various  ages),  for  suicides,  crimes, 
accidents,  consumption  of  various  commodities;  if  the  pre¬ 
liminary  data  could  be  obtained,  for  the  number  of  persons 
crossing  Westminster  Bridge  in  the  year,  the  number  of  visitors 
to  the  Monument,  the  number  of  umbrellas  left  in  the  train, 
and  so  on ;  the  list  could  be  prolonged  indefinitely.  The  more 
important  coefficients  are  calculated  for  most  civilised  countries 
and  published  in  statistical  reports.  A  knowledge  of  them  is 
necessary  for  statistical  investigations. 

It  is  clear  that  such  coefficients  are  essentially  only  a  special 
way  of  writing  a  certain  class  of  arithmetic  averages,  and  with 
reference  to  them  we  may  discuss  more  generally  the  relation¬ 
ship  between  the  terms  used  on  p.  84. 

Average  (A)  x  number  to  which  it  applies  (N)  =  total 
quantity  (Q)  dealt  with, 

or  A  =  §,  Q  =  N  X  A. 

Thus  in  the  case  of  births  A  is  the  coefficient,  N  the  popu¬ 
lation,  Q  the  number  of  births. 

So  far  as  is  practicable,  a  movement  of  Q  should  reflect 
change  in  only  one  factor.  If  N  is  the  whole  population,  Q 
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will  be  affected  by  changes  in  the  sex  and  age  distribution  of 
the  population,  and  by  the  number  of  marriages  and  age  at 
marriage,  as  well  as  by  fecundity.  Methods  of  securing  strict 
comparability  between  the  denominators  in  the  cases  of  birth, 
marriage  and  death  rates  (by  means  of  correcting  factors  *) 
are  in  common  use.  When  these  methods  are  not  applicable 
we  may  fall  back  on  the  rule  given  by  Bertillon  ( Cours  elemen- 
taire,  pp.  94  seq.),  effects  (Q)  should  be  compared  with  their 
immediately  productive  causes  (N) ;  thus  in  the  case  of  mar¬ 
riages,  the  question  should  be  put  “  what  persons  are  capable 
of  marrying?  ”  and  the  answer  is  adult  bachelors  or  spinsters 
or  widowers  or  widows,  and  the  total  of  these  groups  gives  N. 
The  rule  may  be  extended  to  include  persons  or  things  indirectly 
concerned  or  affected;  thus  the  output  of  coal  may  be  con¬ 
sidered  in  relation  to  coal-hewers  (the  immediate  producers) 
or  to  all  employed  at  coal-mines,  and  the  output  of  domestic 
coal  in  relation  to  the  number  of  private  consumers.!  To 
eliminate  all  factors  but  one,  the  entries  in  the  numerator 
should  be  homogeneous,  the  entries  in  the  denominator  should 
be  homogeneous,  and  the  potential  relation  of  a  person  or 
thing  included  in  the  denominator  to  one  in  the  numerator 
should  be  uniform.  For  example,  the  average  value  of  exports 
per  head  of  the  population  satisfies  none  of  these  conditions ; 
exports  make  a  heterogeneous  mass,  the  population  consists 
of  both  sexes  and  all  ages,  and  only  part  of  the  productive 
power  of  the  nation  is  directed  to  the  foreign  market. 

The  crude  coefficients  and  averages,  however,  have  their 
use ;  if  they  change,  some  factor  or  factors  have  changed,  and 
if  it  is  known  that  all  but  one  are  nearly  constant,  the  coeffi¬ 
cients  move  with  an  identified  factor.  Thus  if  N  is  the 
population,  n  the  number  of  marriageable  persons,  and  M 
the  number  of  marriages,  the  crude  coefficient  is  given  by 

n  M  M  n  .,  n .  r  .  .  ,  M  , 

C  =  -T7  =  x  vf ;  if  xr  is  constant  C  varies  with  — ,  the  more 
N  n  N '  N  n 

logical  coefficient. 

D.  The  Mode. — We  pass  to  the  consideration  of  two  other 
means  in  common  use  among  statisticians  but  unfortunately 

*  See  Elementary  Manual  of  Statistics  (by  the  present  author),  pp.  105-7, 
and  Statistical  Journal,  1906,  pp.  34-147. 

f  See  a  discussion  on  homogeneity,  comparability  and  relativity,  Statistical 
Journal ,  1908,  pp.  463-8. 
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not  yet  consciously  introduced  into  common  parlance.  There 
are,  however,  some  popular  phrases  which,  if  they  have  any 
definite  meaning,  very  nearly  resemble  the  averages  in  ques- 
The  average  tion.  When  we  hear  of  the  average  clerk,  the 
man.  average  working-man,  the  phrases  admit  many 
interpretations.  In  some  way  these  persons  are  supposed  to 
be  types  of  their  kind.  The  average  clerk  may  be  supposed 
to  mean  the  one  who  receives  the  average  income  of  all  clerks, 
whose  expenditure  oh  necessaries  and  on  luxuries  is  the 
average  of  all  of  his  class,  who  takes  the  average  amount  of 
interest  in  his  work,  if  of  average  ability  and  average  age.  It 
will  be  seen  that  this  clerk  is  ideal,  and  not  to  be  found  in  any 
random  assembly  of  half-a-dozen ;  for  each  of  these  will  have 
some  peculiarity,  some  quality  in  which  he  differs  from  the 
average;  the  average  man  of  the  newspapers  does  not  exist 
in  the  flesh,  but  is  an  imaginary  person  to  whom  certain 
attributes  are  attached. 

Quetelet's  average  man  is  familiar ;  *  he  is  of  average  height, 
weight,  strength,  girth  and  lung  capacity,  with  eyes  of  normal 
range  and  medium  tint ;  but  he  is  a  more  satis¬ 
factory  model  than  the  newspapers’  average,  for 
in  regarding  him  we  see  the  type  from  which  all  other  men 
may  be  supposed  to  have  deviated;  the  creature  that  would 
have  been  produced  if  all  disturbing  causes  were  removed. 
That  any  actual  person  should  answer  exactly  to  all  these 
standards  is  of  course  in  the  highest  degree  improbable. 

Quetelet  refers  neither  to  the  arithmetic  average,  nor  to 
the  median  or  the  mode  (defined  in  the  sequel),  but  to  a  mean 
about  which  all  the  similar  measurements  are  grouped  in 
accordance  with  a  definite  law,  the  obedience  of  anthropo- 
metrical  measurements  to  which  was  his  chief  theme. 

The  newspaper  average,  on  the  other  hand,  seems  to  be 
the  mode,  the  position  of  the  greatest  density,  which  may  be 
explained  as  follows  : — Referring  back  to  the  table 
of  American  wages,  p.  69,  or  to  the  table  on  next 
page,  it  will  be  noticed  that  in  looking  down  column  2  we  find 
the  numbers  increase  till  we  come  to  685  (between  $1.15  and 
$1.24),  and  then  after  fluctuations  diminish.  This  number, 
685,  is  the  greatest  which  occurs  in  any  10-cent  group. 


Quetelet’s 
average  man. 


The  mode. 


*  See  Quetelet’s  Physique  Sociale  ;  and  Edgeworth  in  Statistical  Journal, 
December  1893. 
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Determination  of  the  Mode. 


Numbers  of  Wage-Earners  from  the  Senate  Report ,  1893,  U.S*A 
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The  value  of  the  graded  quantity  in  a  statistical  group  (of 
wages,  heights  or  some  other  measurable  quantity)  at  which 
the  numbers  registered  are  most  numerous  is  called  the 
mode,  or  the  position  of  greatest  density,  or  the  predominant 
value.  In  the  case  of  a  group  that  is  represented  by  a 
continuous  curve  the  value  is  the  abscissa  of  the  maximum 
ordinate. 

In  this  column  2  we  have,  however,  14  maxima  in  the 
correct  sense  of  the  word,  the  numbers  rise  and  fall  with  little 

Method  of  regularity,  and  there  are  14  modes  of  which  that  at 
determining  $1.  15-$ i. 24  is  the  most  pronounced.  But  if  the 

the  mode.  gr0UpS  are  made  wider,  and  the  numbers  entered 

as  in  column  6  in  half-dollar  limits,  there  are  only  three  modes, 
or  if  we  neglect  the  small  group  of  8  at  $5-QO  only  two.  The 
position  of  the  largest  group  of  1472  is  not  at  once  assignable 
more  closely  than  as  between  .75  and  1.25. 

A  further  method  of  approximating  to  the  mode  may  be 
illustrated  as  follows  : — When  the  numbers  are  tabulated  in 
10-cent  groups,  as  on  p.  97,  the  mode  is  quite  indeterminate ; 
in  20-cent  groups  the  successive  numbers  beginning  at  .25-44 
are  16,  144,  270,  370,  989,  557,  538,  531,  etc.,  and  the  number 
989  (in  the  group  $i.o5-$i.24)  is  a  distinct  mode;  if  we  begin 
the  20-cent  groups  at  .35-54,  the  numbers  are  74,  242,  282, 
505,  78 4,  924,  274,  etc.,  and  924  (in  the  group  $i.35-$i.54)  is  a 
mode;  by  this  double  tabulation  it  is  seen  that  the  20-cent 
grouping  does  not  decide  the  mode.  In  30-cent  groups  we 
have  355,  674,  1242  ($i.i5-$i.44),  740,  etc.,  if  we  begin  with 
$-55-$-84;  we  have  439,  1190  ($.95-11.24),  1023,  etc.,  if  we 
begin  with  $.65-$ .94;  and  483,  1088  ($1.05-11.34),  996,  etc.,  if 
we  begin  with  $.75-11.04  :  the  mode  by  each  of  these  groupings 
lies  in  a  group  which  contains  $1.15  to  $1.24,  and  this  smaller 
group  may  be  assumed  to  contain  the  mode,  which  is  thus  at 
or  near  $1.20.  The  example  here  taken  is  drawn  from  a  group 
of  very  irregular  figures,  which  specially  illustrate  the  diffi¬ 
culties.  The  method  just  adopted  may  be  summarised  thus  : — • 
Tabulate  the  figures  again  and  again  in  gradually  widening 
groups  till  regularity  is  obtained;  then  examine  again  the 
groups  which  have  the  selected  width  and  see  if  the  mode  is 
shifted  when  the  lower  limit  of  the  grouping  is  moved;  if  it 
is  shifted  the  groups  are  not  wide  enough ;  if  it  is  not,  the  mode 
is  in  the  smallest  group  common  to  the  larger  equal  groups 
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which  all  contain  it.  A  diagrammatic  method  is  described 
on  p.  138. 

Even  when  our  numbers  are  initially  regular,  it  is  seldom 
easy  to  determine  the  mode  exactly.  The  difh-  Indefiniteness 
culty  is  best  seen  by  an  example.  Suppose  that  of  the  position 
we  have  the  following  returns  as  to  heights  of  ofthemode* 
a  large  number  of  men  : — 


n 


Ui 


67  in. 

- 

- 

455 

67  i  „ 

- 

- 

475 

67i  >. 

- 

- 

490 

67i  .. 

-- 

- 

500 

68  „ 

- 

- 

485 

68  J  „ 

- 

- 

467 

68J  .. 

- 

- 

445 

At  first  sight  the  mode  appears  to  be  at  67J  in.  exactly ;  but  it 
must  be  remembered  that  even  in  accurate  measurements  all 
heights  within  J  in.  of  6 7}  in.  will  be  entered  as  67!  if  the 
measurements  are  taken  to  the  nearest  quarter  inch,  or  will 
have  been  tabulated  in  this  way  if  the  measurements  were 
more  accurate.  Hence  67J  in.  in  reality  stands  for  from  67!  to 
67!  in.  If  the  500  heights  so  entered  were  distributed  uniformly 
through  this  interval,  the  mode  might  be  given  with  67J  in. 
with  fair  accuracy ;  but  there  are  signs  in  the  figures  that  the 
mode  is  below  this.  Suppose  that  the  figures  in  reality  come 
from  the  following  measurements  : — 


From  67^  to  67!  in. 

„  67I  „  67J  „ 

„  67i  „  67I  „ 

„  67!  »  67!  „ 

„  67 I  „  67!  „ 

„  67!  -  68  » 

„  68  „  68|  „ 


238  1 
245  / 
245  \ 
250  J 

250  1 

243  / 


483  at  67I  in. 
495  at  67!  „ 
493  at  67  J  „ 


242 


and  that  these  had  been  tabulated  as  in  the  last  column,  the 
mode  would  appear  as  67!  in. ;  while  the  same  figures  tabu¬ 
lated  as  before  gave  it  as  67 J  in.  The  probability  of  some  such 
shifting  is  seen  from  the  original  grouping,  where  the  number  at 
6 7!  in.  is  greater  than  that  at  68  in.  From  this  discussion  we 
may  see  that  the  mode  is  always  a  little  indefinite,  depending 
on  the  width  of  the  groups  in  which  the  items  are  tabulated, 
and  on  the  exact  position  of  the  limits  of  the  groups.  As  the 
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items  we  deal  with  become  more  numerous,  we  shall  find 
regularity  when  they  are  tabulated  in  narrower  groups,  and 
the  mode  can  be  assigned  with  greater  accuracy. 

A  mathematical  method  (p.  228)  suggests  that  the  mode  of 
such  a  group  as  given  by  the  heights  can  be  determined  by 
dividing  the  interval  containing  the  mode  (67!  to  67!  in.)  in 
proportion  to  the  differences  between  the  numbers  registered 
in  this  interval  to  the  numbers  in  the  adjacent  interval,  viz.  : 
500-490  :  500-485  =  10  :  15.  The  mode  so  computed  is  at 

^67!  +  °f  in-  —  6 7lff  in.  By  this  method  if  two 

intervals  contained  the  same  numbers  the  mode  would  be 
placed  at  the  value  dividing  them,  and  if  the  numbers  on 
either  side  of  that  containing  the  greatest  number  were  equal 
(if  the  grouping  was  symmetrical)  the  mode  would  be  placed 
at  the  centre  of  the  middle  interval,  in  both  cases  as  we 
should  have  determined  a  priori. 

Now  is  the  “  average  workman  ”  the  man  who  earns  $1.73 
per  diem,  the  simple  average  of  the  whole  group  on  p.  69,  or  a 
The  “  average  man  making  $1.20  the  mode  ?  In  ordinary  speech 
man-”  the  latter  is  meant.  The  “  average  clerk  ”  is  not 
the  one  whose  measurable  qualities  are  an  arithmetic  mean  of 
all  similar  qualities,  but  one  whose  qualities  are  found  in  the 
same  degree  in  the  greatest  number  of  his  fellows.  There  are 
more  clerks  who  read  the  evening  paper  than  who  read  Homer, 
more  who  go  to  music-halls  than  to  oratorios,  more  whose 
incomes  are  £100  than  £500,  more  who  live  four  miles  from 
the  City  than  one  or  twenty.  Even  with  this  explanation  the 
average  man  is  not  a  real  creature,  for  fortunately  no  individual 
has  no  qualities  out  of  the  common.  The  fact  that  the  average 
is  a  pure  abstraction  is  of  importance  directly  we  apply  statis¬ 
tics  to  actual  affairs;  these  American  workpeople  cannot  be 
legislated  for  in  the  mass  as  if  they  all  earned  $1.20,  or  as  if 
those  who  were  alike  in  this  did  not  differ  in  other  respects, 
even  doing  very  varying  quantities  of  work  for  this  wage.  No 
single  measurement  expresses  completely  even  the  economic 
importance  of  condition  of  a  group  of  workmen,  but  if  we  are 
the  mode.  taking  a  single  measurement,  that  of  the  “  mode  ” 
is  often  the  most  useful.  It  is  at  the  mode  that  we  find  the 
greatest  number  of  whose  greatest  good  we  may  be  thinking. 
Whereas  the  arithmetic  mean  and  the  "  median  ”  (defined 


AVERAGES 


IOI 


below)  may  correspond  to  no  reality  but  be  merely  numerical 
conceptions,  the  mode  is  precisely  that  number  for  which  most 
instances  can  be  found.  It  shows  the  commonest  result,  that 
most  often  obtained,  and  is  of  very  general  application.  For 
an  intending  passenger  by  train  or  'bus,  it  is  more  important 
to  know  the  most  ordinary  than  to  know  the  average  number 
in  a  compartment.  The  mode  rather  than  the  average  in 
chest  measurements  is  the  number  most  suitable  for  the 
ready-made  clothier.  For  providing  a  post-office  or  a  store, 
the  mode  in  postal  orders  or  prices  of  tea  needs  to  be  known 
rather  than  any  other  average.  Even  the  favourite  coin  in  a 
collection  may  show  the  spirit  of  the  congregation  better 
than  the  arithmetic  average  of  their  contributions.  In  these 
last  instances  it  may  be  noticed  that  the  mode  is  quite  definite. 

A  special  feature  of  the  mode  is  that  it  is  entirely  unin¬ 
fluenced  by  extremes.  A  cheque  for  £1000  in  a  collection 
disturbs  the  arithmetic  average,  but  not  the  Advantages  of 
mode.  The  incomes  of  a  small  number  of  mil-  the  mode, 
lionaires  and  an  army  of  paupers  may  have  the  same  arithmetic 
average  as  a  nation  composed  entirely  of  people  moderately 
well  off ;  but  the  modes  will  be  very  different  in  the  two  cases. 
In  considering  the  change  year  by  year  in  a  group  of  figures, 
as  for  instance,  the  wages  of  a  large  group  of  workmen,  we 
cannot  tell,  if  we  take  the  arithmetic  average  as  our  criterion, 
whether  an  improvement  is  due  to  a  levelling  up  of  the  badly 
paid  or  a  rapid  increase  for  those  who  were  already  well  off, 
while  the  mode  will  show  the  changing  position  of  the  main 
body.  Mr.  Booth's  London  is  crowded  with  instances  of 
the  use  of  the  mode.  Each  age  diagram  shows  the  mode  in 
ages  for  an  occupation;  each  wage  list  that  in  wages.  His 
whole  description  of  Class  E,  the  typical  workmen  of  modern 
towns,  is  based  on  the  same  principle.  His  measurement  of 
social  status,  based  on  the  number  of  rooms  occupied  or  servants 
employed,  can  be  used  easily  for  stating  the  mode  (four  rooms 
to  a  family  and  no  servant)  but  not  any  other  average. 

An  objection  to  this  average  is  that  there  are  many  groups 
of  figures  to  which  it  is  not  applicable.  If  we  have  a  very 
irregular  group  of  numbers  with  no  particular  shortcomings  of 
type,  such  as  the  populations  of  towns  in  Eng-  the  mode, 
land,  the  mode  would  be  quite  indefinite,  and  would  give  no 
information  of  importance.  The  use  of  the  mode  is  to  indicate 
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the  type  from  which  other  figures  may  be  regarded  as  diverging. 
Thus,  in  these  wage  figures,  the  type  is  about  $1.20,  and  other 
examples  lie  on  either  side,  wages  of  men  who  have  for  some 
reason  or  other  more  or  less  than  the  normal  degree  of  skill 
or  opportunity.  If  there  is  a  type,  as  in  Quetelet’s  instances, 
the  mode  will  show  it.  The  mode  only  tells  us  one  fact, 
however,  about  each  type,  and  it  is  necessary  to  supplement 
it  with  other  measurements. 

E.  The  Median. — When  we  are  dealing  with  a  group  of 
persons  or  things,  each  of  which  possesses  some  measurable 
attribute,  such  as  height  or  wage,  we  can  choose  certain  quanti¬ 
ties  which  describe  the  group  in  brief.  Suppose  all  the  items 
arranged  in  a  series  in  ascending  order  of  the  magnitude  of 
this  attribute ;  the  magnitude  appertaining  to  the  item  half¬ 
way  up  the  series  is  called  the  median*  Thus  if  in  a  group 
of  wage-earners  200  earn  less  than  20s.  3d.,  one  earns  20s.  3 d., 
and  200  more,  20s.  3 d.  is  the  median  wage.  There  are  as 
many  items  below  20s.  3d.  in  the  supposed  series  as  above  it. 
The  magnitudes  one-quarter  and  three-quarters  up  the  series 
are  called  the  quartiles  ;  *  those  one,  two  .  .  .  nine-tenths  up 
are  the  deciles  ;  those  one,  two  .  .  .  ninety-nine  hundredths 
up  are  the  'percentiles .  The  median  is  more  definite  in  position 
than  the  mode.  When  we  are  dealing  with  exact  measure¬ 
ments,  if  we  have  an  odd  number  of  items  it  is  the  middle  one, 
if  an  even  number,  it  lies  between  the  two  middle  items, 
which  are  in  general  near  together,  or  coincides  with  them  if 
they  are  equal.  If  the  magnitudes  are  not  given  exactly, 
but  as  within  small  limits,  we  can  by  the  method  described  on 
pp.  106-7  make  a  good  estimate  of  their  actual  values.  The 
median  is  not  affected  by  exceptional  entries  at  all ;  the  exist¬ 
ence  of  any  number  of  millionaires  has  no  more  effect  on  the 
median  income  than  of  an  equal  number  of  any  other  persons 
whose  incomes  are  above  the  median.  For  many  purposes 
it  is  of  course  necessary  to  allow  these  extreme  instances  more 
weight  than  those  which  are  nearer  the  average;  but  the 
arithmetic  average  often  gives  them  undue  weight  for  this 
democratic  age,  since  a  single  millionaire  can  counterbalance 
thousands  of  ordinary  working  men.  A  further  advantage  is 
that  it  is  extremely  simple  to  find,  not  needing  much  arith- 

*  These  quantities  have  already  been  used  in  tabulation,  p.  70. 
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metical  work,  for  we  need  not  do  more  than  count  those  well 
above  and  well  below  the  average,  and  look  more  carefully 
at  those  near  it. 

There  is  a  yet  more  important  advantage  in  the  use  of  the 
median ;  it  can  often  be  found  exactly,  when  our  information 
as  to  the  items  in  question  is  neither  accurate  nor  No  need  for 
complete.  This  will  be  clear  from  one  or  two  complete  infer- 
examples.  It  may  be  that  in  a  “  wage  census” 

100,000  persons,  whose  wages  were  far  below  the  average, 
do  not  come  into  the  returns  at  all,  and  it  is  very  difficult 
to  estimate  their  effect  on  the  arithmetic  average  for  want 
of  information  as  to  their  earnings ;  but  to  find  the  median 
exactly,  we  need  only  know  their  number,  not  their  earnings ; 
and  if  we  can  only  assign  a  maximum  for  their  number,  we  still 
can  place  the  median  within  narrow  limits.  The  addition  of 
100,000  men  with  wages  below  15s.  to  a  general  summary  for 
the  356,000  men  on  p.  470  of  the  General  Report  on  Wages  in 
1886  (C. — 6889),  would  still  leave  the  median  in  the  group  20s. 
to  25s.  where  it  already  is ;  the  change  would  be  very  marked, 
however,  in  the  lower  deciles  and  quartiles,  and  the  arithmetic 
average  would  be  lowered  by  at  least  2 s.  id.  The  same  argu¬ 
ment  applies  to  incomes ;  information  is  often  very  deficient, 
but  it  is  in  many  cases  possible  to  assert  that  a  number  of 
men,  whose  exact  income  is  unknown,  receive  above  a  certain 
assigned  sum,  or  even  between  two  assigned  limits,  which  is  all 
we  need  to  know  about  them  to  determine  the  median,  if  it 
lies  below  the  lower  limit. 

Again,  in  tracing  the  history  of  wages  throughout  the 
century  it  is  often  very  difficult  to  find  the  correct  average, 
but  at  the  same  time  it  is  frequently  possible  to  say  that  a 
very  large  class  of  men  earned  below,  say,  15s.  a  week,  and 
another  very  large  class  above  30s.  whose  wages  we  do  not 
exactly  know,  and  a  more  definite  number  between  15s.  and 
20s.,  and  25s.  and  30s. ;  and  in  order  to  find  the  median  all 
we  need  to  do  is  to  investigate  more  exactly  the  wages  between 
20s.  and  25s.,  if  that  is  the  grade  which  contains  it ;  and  even 
if  we  have  not  complete  information  here,  we  can  still  say 
that  the  median  certainly  lies  between  certain  narrow  limits. 
There  is  yet  another  advantage,  perhaps  more  important,  that 
the  median  is  applicable  to  quantities  which  are  incommensur- 
not  capable  of  measurement  at  all.  This  develop-  able  quantities- 
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ment  is  especially  due  to  Galton.*  Suppose  it  do  be  required, 
for  example,  to  find  among  a  large  class  of  boys  the  average 
in  intelligence.  It  is  clear  that  it  is  not  easy  to  find  the  arith¬ 
metic  average  of  a  quantity  which  cannot  be  properly  measured 
even  by  the  most  elaborate  system  of  marks,  but  on  the  other 
hand  it  would  not  be  at  all  difficult  with  a  class  of,  say,  twenty 
boys,  to  place  them  in  order  of  intelligence  without  committing 
oneself  to  such  a  statement  as  that  A/s  cleverness  was 
25  per  cent,  more  than  B/s;  and  the  tenth  or  eleventh  boy 
in  this  arrangement  would  show  the  style  of  boys  in  the  class, 
at  least  as  well  as  any  other  average.  The  disadvantage  of 
this  method,  the  reason  why  it  is  not  universally  applicable, 

D'sadva  tages  *s  ^a^  median  a  series  of  observations 
may  be  totally  removed  from  its  type,  and  in 
fact  may  not  be  situated  near  any  of  the  different  objects 
which  are  observed.  Thus,  if  we  had  two  large  groups  of 
wages  of  a  thousand  men  between  15s.  and  25s.,  and  another 
thousand  between  35s.  and  45s.,  the  median  would  give  us 
any  position  between  25s.  and  35s.,  where  as  a  matter  of  fact 
not  a  single  wage-earner  would  be  found.  The  median  is 
then  chiefly  useful  when  we  are  dealing  with  a  series  of  objects 
of  which  the  main  part  lie  fairly  close  together ;  a  few  extremes 
do  not  affect  it. j' 


Ilf  m  is  the  median  and  a  the  arithmetic  average  of  n  quantities  xly  x2  .  .  . 
xn,  and  we  call  x1—z,  x.2—z  .  .  .  the  deviations  of  the  x’s  from  any  quantity  z, 
then  m  is  the  value  of  z  which  makes  the  sum  of  the  deviations  (all  taken 
positively)  a  minimum,  a  is  the  value  which  makes  the  sum  of  the  squares  a 
minimum.  The  first  statement  becomes  obvious  from  the  following  analogy  : 
suppose  2M  +  i  places  in  a  straight  line  are  each  served  by  a  single  wire  from 
a  telephone  exchange  at  the  nth  place  from  one  end;  the  lengths  of  the 
wires  correspond  to  the  deviations;  now  if  the  exchange  is  moved  to  the 
w  +  ith  (or  central  place),  n- f-i  wires  are  shortened  and  n  wires  lengthened 
each  by  the  same  distance,  so  that  the  aggregate  of  wire  is  diminished;  if 
the  number  of  places  is  even,  the  minimum  is  obtained  at  any  position  at 
or  between  the  wth  and  w  +  ith  from  either  end.  For  the  second,  we  notice 
that  'Zx—na,  and  that  2(x— z)  2  =  1,x2— na2-\-n  (a— z)*,  which  is  a  minimum 
when  z—a. 


The  following  table  shows  the  description  of  76  items  by 
the  help  of  the  various  averages  now  described  : — 


*  See,  for  instance,  Natural  Inheritance,  p.  47. 

f  On  the  relative  advantages  of  this,  and  a  more  mathematical  method, 
see  Yule  and  Galton  in  the  Statistical  Journal  for  1896,  especially  pp.  392-398. 
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Measurements  of  Boys  of  Ages  13  to  15  Years. 


No. 

Age. 

Height. 

Weight. 

No. 

Age. 

Height. 

Weight. 

Tabulation  of 
Weights. 

I 

yrs.  mth. 

14.  I 

ft.  in. 

4-H* 

st.  lb. 

6.  of 

39 

yrs.  rath. 

14.7 

ft.  in. 

4.  II* 

st.  lb. 

6-3* 

Arithmetic  aver- 

2 

I4.9 

4. 10 

5-7 

40 

I3-I 

4- 11* 

5-7 

age,  6  st.  i|  lbs. 

3 

I4.7 

5-Si 

7-5 

41 

14-3 

4. 1 1 

6-4* 

4 

I3.H 

5-o 

6-31 

42 

13-3 

4-4* 

4- 11* 

The  same,  when 

5 

I4.  II 

5-3S 

8.o* 

43 

14-3 

5-3 

6.7* 

weights  are  en- 

6 

I4.7 

4. 10 

5-o 

44 

13.6 

5-i* 

6-13* 

tered  only  to 

7 

14.3 

4. 10 

6.7 

45 

14.2 

4-8f 

6.  of 

nearest  stone, 

8 

14.9 

5-5 

8.5* 

46 

13-5 

5-2 

7-4 

6  st.  I*  lbs. 

9 

I4.II 

4- 9  2 

5-i2f 

47 

13.8 

5-2* 

6. 1 1 

10 

14-3 

4- ii? 

6.  uf 

48 

14.6 

5-4 

7-4* 

Median,  6  stones 
I*  lbs. 

11 

13-4 

4-7 

5- 12 

49 

14.8 

5-i* 

6. 10 

12 

14.7 

5-3f 

7-8 1 

50 

13-3 

4-8* 

5-o 

13 

13.8 

4-71 

5-3 

5i 

13.0 

5-i* 

6.7 

Quartiles,  6  st.  9^ 

14 

14.5 

5-2* 

7.8* 

52 

13.10 

4- 1 1* 

7-3* 

lbs.,  5  st.  6*  lbs. 

IS 

14.4 

5-o 

6.0 

53 

14.8 

4- 11* 

6-91 

16 

13.6 

4.9 

5-6 

54 

13.8 

4-5f 

4-9* 

Average  of  quar- 

17 

14.0 

5-2£ 

7-7* 

55 

14.8 

5-4* 

7.0 

tiles,  6  st.  1  lb. 

18 

13.0 

4-8* 

5-3 

56 

14.0 

4.10 

6.2* 

19 

14.7 

4.11 

6.12f 

57 

13.10 

49 

5-5 

Half  of  the  ex- 

20 

14.10 

5-i 

6.9 

58 

13.2 

5-o* 

6.4 

amples  lie  within 

21 

13-9 

4.11 

5- 11 

59 

13.6 

4-7 

5-2* 

9  lbs.  of  median. 

22 

14. 10 

4-81 

5-ii 

60 

13.0 

4.9 

5-9f 

23 

13-4 

4-9* 

5- 8f 

61 

13-3 

4-8* 

5-5* 

Mode  is  between 

24 

I3*1 

5-2* 

6.1 

62 

13-5 

4-8* 

6-5S 

6  st.  and  6*  st. 

25 

14.0 

4-6* 

5-6* 

63 

13.10 

5-5* 

7-io* 

26 

14.6 

5-3  * 

7-6* 

64 

13. 1 

4-8* 

6.2* 

Average  weight 

27 

14-3 

5-oi 

5*  uf 

65 

13.10 

5-4 

7.2 

between  ages  13 

28 

13*9 

4-9 

5- 11 

66 

14.0 

4.9 

5-o* 

and  1 3*  years, 

29 

13-4 

5-i* 

5-9 

67 

13-3 

4-7 

5-o 

5  st.  9*  lbs. ;  13* 

30 

14.4 

5-i 

6.8* 

68 

13-8 

4. 1 1 

6.1* 

and  14  years,  5  st 

3i 

14.10 

4-9  * 

4-7* 

69 

13-7 

4.  uf 

6-4* 

1 3*  lbs.  ;  14  and 

32 

13.2 

4-9* 

5-i3i 

70 

13-11 

4-8 

4-4* 

14*  years,  6  st.  3* 

33 

14. 1 

4.85 

5-8* 

7i 

13-11 

4.8 

4-4* 

lbs. ;  1 4*  and  15 

34 

13.10 

5-2* 

6.8* 

72 

13.2 

4-7f 

4. 10 

years,  6  st.  8§  lbs. 

35 

14.0 

4.  ii| 

5-7 

73 

14.0 

4.11 

6-5 

Heights  may  be 

36 

14.4 

4.11 

6-5 

74 

13-3 

4-3* 

4.1* 

37 

14.8 

4.11 

6.  of 

75 

13-3 

5-o 

7-2f 

tabulated  in  the 

38 

13-7 

5-°4 

6.2 

76 

13-7 

4-8* 

5.6 

same  way. 

Heights  arranged  in  order  of  magnitude  (in.) — 

51  b  52 i,  53i  54b  55.  55.  55,  55f.  55f,  56, 
56,  56J,  56b  56 b  562,  56I,  561,  56I,  56! ; 
56J,  57,  57,  57,  57,  57,  57b  5 7b  57b  57b 

58,  58,  58,  58,  59,  59,  59,  59,  59; 

59,  59,  59b  59§.  59h  59b  59b  59 1,  59i,  59b 

60,  60,  60,  6o|,  60  J,  6o|,  61,  61,  61  \  ; 

61J,  61J,  6i|,  62,  62J,  62J,  62J,  62J,  62J,  63, 
63  i  63I,  63 J,  64,  64,  641  65,  651,  65 J. 
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A  graphic  method  of  finding  the  median  o£  these  heights 

Graphic  closely  is  given  by  Mr.  Galton  in  the  Report  of 

method-  the  Anthropometric  Committee  of  the  British 
Association,  1881,  p.  247;  and  is  illustrated  by  the  diagram 
facing  this  page. 

On  a  horizontal  line  mark  off  equal  intervals  representing 
units  of  measurement,  say  inches.  On  a  vertical  scale  mark 
off  equal  intervals  representing  the  number  of  instances, 
i.e.}  persons  whose  heights  are  measured.  Beginning  at  the 
lowest,  51 J  in.,  on  a  vertical  line  mark  as  many  dots  at 
equal  intervals  on  the  vertical  scale  as  there  are  persons  at 
that  height  (in  this  case  only  one),  so  that  each  dot  represents 
one  person.  From  the  highest  dot  thus  marked,  suppose  a 
horizontal  line  drawn  till  it  is  over  the  next  height  division 
at  which  there  is  an  instance,  52!  in.,  and  with  this  new 
base  proceed  as  before,  marking  each  instance  at  52J  in. 
by  a  dot  vertically  above  the  52|-in.  mark.  Next  draw  a 
connected  line  through  the  middle  points  of  the  consecutive 
vertical  rows  of  dots;  if  there  is  an  odd  number  of  dots,  the 
middle  one  is  taken  as  the  middle  point ;  if  an  even  number, 
the  middle  point  is  half-way  between  the  middle  ones. 

On  the  vertical  scale  mark  the  positions  of  the  median, 
quartiles,  etc.,  obtained  by  dividing  the  distance  representing 
the  total  number  of  instances  into  appropriate  parts,  and 
through  these  points  draw  horizontal  lines  to  intersect  the 
connected  line  already  drawn.  The  points  of  intersection 
lie  vertically  above  the  heights  required,  as  marked  on  the 
horizontal  scale. 

Now  it  may  be  assumed  that  the  heights  of  all  persons 
returned  at,  say,  58 J  in.,  are  in  reality  evenly  distributed 
between  the  limits  58 J  and  58J  in.,  heights  lying  within 
which  would  be  so  returned;  and  it  can  be  verified  that  the 
construction  just  given  shows  the  place  of  the  median,  deciles, 
etc.,  almost  exactly  on  this  hypothesis. 

The  following  analysis  is  only  important  when  the  number  of  instances 
is  small,  and  the  position  of  the  quartiles,  etc.,  is  not  evident.  There  are 
two  cases,  (1)  where  the  observations  are  exact,  (2)  where  the  observations 
are  given  in  grades  or  to  the  nearest  scale  mark. 

(1)  The  following  45  numbers  are  the  numbers  of  minutes  occupied  by 
trains  on  a  certain  distance  according  to  time-table  : — 

45,  46,  47,  48,  48,  5i,  53,  54,  55,  58;  61,  61,  62,  65,  65,  69,  69,  69,  71,  76; 
76,  76,  77,  77,  78,  80,  81,  81,  82,  82;  83,  83,  84,  85,  85,  85,  85,  87,  88,  89;  90, 
92,  94,  101,  103. 


GRAPHIC  METHOD  OF  FINDING  MEDIAN,  QUARTILES  AND 
DECILES  (after  Galton  :  Anthropometric  Committee  :  Brit.  Ass'1.). 

For  the  Heights  of  the  76  boys,  between  ages  of  13  and  15. 


Median  59 J  inches. 

Quart iles 

Half  inter-quartile  distance 
2*2. 

Deciles  55*6,  56*6,  57,  57*9, 
63-6,  62,  607,  597. 


Arithmetic  average,  S9'°95- 
Greatest  density  57  or  59. 

,,  ,,  in  smoothed 

curve  would  be  about  58. 
Geometric  average  58 ’98. 


To  face  page  106. 
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The  median  is  the  23rd  instance,  viz.  77  minutes. 

To  find  the  quartiles  we  must  divide  the  45  numbers  into  4  equal  parts. 
Suppose  that  on  such  a  scale  as  the  vertical  scale  in  the  diagram,  p.  106, 
the  instances  are  entered  at  ii,  .  .  .  444,  the  distance  45  representing  the 
whole  space  to  be  divided.  The  quartiles  are  at  n £  and  33I.  11J  is  between 

the  11th  instance  (61)  at  10. V  and  the  12th  instance  (also  61)  at  ii£,  and 
the  lower  quartile  is  61.  33!  comes  between  the  34th  and  35th  instances, 

both  85.  If  the  entries  were  not  equal  we  might  take  f  of  the  nearer  entry 
(the  34th) of  the  35th. 

Similarly  the  deciles  are  at  the  marks  4^,  9,  13^  ...  on  the  scale. 
The  lowest  is  at  the  5th  entry  (48  min.),  the  next  half-way  between  the 
9th  and  10th  (56^  min.),  and  so  on. 

The  positions  of  the  D’s,  Q’s  and  M  on  the  diagram  are  marked  on  this 
principle. 

We  have  the  following  scheme  for  the  median  and  quartiles  : — 


No.  of 
Cases. 

Median. 

Lower  Quartile. 

Upper  Quartile. 

4  n 

£(2«th  +  2W-|-Ith) 

i(«th-fn  +  ith) 

l(3Wth-f  3W  + Ith) 

4W-f  r 

2W  +  Ith 

inth  +  fn  +  Ith 

|3W+Ith  +  i3W  +  2nd 

4M  +  2 

\  (2fl  +  Ith  -f  2  W  +  2nd  ) 

M+Ith 

3W  +  2nd 

4M  +  3 

2W  +  2nd 

|«+ith4-K+2nd 

}3»+2,K,  +  |3»  +  3rd 

A  similar  scheme  could  be  worked  out  for  the  deciles. 

(2)  If  the  numbers  are  given  in  grades  (whether  as  between,  say,  53  and 
54  in.,  or  as  at  53  in.  to  the  nearest  J  in.,  i.e.,  between  52!  and  53J  in.),  they 
may  be  regarded  as  spaced  uniformly  through  the  grade,  and  then  the 
method  of  case  (1)  applied. 

The  method  can  be  illustrated  from  the  ages  of  married  men,  column  6 
of  the  table,  p.  86.  Here  549  men  are  over  40  years,  413  over  45  years; 
the  500th  man  is  one  of  the  136  in  the  grade  40  to  45  years,  in  fact  the  48th 
or  49th  man  in  that  grade.  If  they  are  uniformly  distributed,  the  49th  is 
at  the  49th  of  136  equal  intervals  in  which  the  5  years  may  be  divided. 


Hence  the  median  is  at  40  + 


549-500 
136 


of  5=41*80  years.  It  is  not  worth 


while  to  try  to  place  it  more  exactly.  Similarly  the  lower  quartile  is  the  age 
where  we  find  the  750th  man,  somewhere  in  the  grade  30-35  years,  and 


may  be  taken  as  30 


855-750 


50 


295-250 
96 


152 

of  5  =  52-34- 


of  5  —  33*45  years,  and  the  upper  quartile  at 


Simple  graphic  methods  may  readily  be  found  for  either  case. 


F.  Geometric  Mean. — If  av  a2  .  .  .  an  are  n  quantities 
G  the  geometric  or  logarithmic  mean  is  given  by 


G  =  U\J  (1 1 


a 


n> 


and  log  G  =  -  (log  a±  +  log  a2  +  .  .  .  +  log  an). 


The  geometric  mean  is  always  less  than  the  arithmetic 
mean  of  the  same  quantities. 

This  mean  is  appropriately  used  when  emphasis  is  on  the 
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ratio  between  two  quantities  rather  than  on  their  absolute 
difference.  If  the  difference  between  8  and  13  is  of  the  same 
importance  as  that  between  13  and  18,  then  the  mean  of 
8  and  18  is  properly  taken  to  be  13,  equidistant  from  either; 
but  if  the  ratio  8  to  12  is  of  the  same  importance  as  that  of 
12  to  18,  then  the  mean  of  8  and  18  is  properly  taken  as 

12  =  s/S  X  18. 

We  obtain  an  analogy  as  follows  : — Of  five  quantities  av 
a2,  a3,  ax,  a5,  let  a1  and  a2  be  less  than  A  (the  arithmetic  mean 
of  all)  and  also  less  than  G,  and  the  others  be  greater. 


Then  5A  —  a ^  -j~  ^2  ^3  ^4  ^5 

and  (A  —  af)  -{-  (A  —  ^2)  =:  (^3  —  A)  (^4  —  A)  -f~  (^5  —  Aj 
and  G5  =  ax  Jfcf  a2  X  a3  ^  ax  %  a5, 


and 


G 

ax 


G 

x  —  = 

Ciey 


a. 


G 


ax 

X  X 

vx 


a5 

G‘ 


Thus  in  one  case  the  sum  of  the  excesses  of  the  mean  equals 
the  sum  of  its  defects ;  in  the  other  the  product  of  the  ratios 
of  the  mean  to  the  quantities  less  than  it  equals  the  product 
of  the  ratios  of  the  greater  quantities  to  the  mean. 

An  important  use  of  the  mean  is  in  connection  with  prices. 
A  general  rise  of  prices  from  100  to  120  is  exactly  the  same 
from  many  points  of  view  as  a  rise  from  120  to  144,  and  is 
greater  than  a  rise  from  120  to  140.  This  consideration  may 
have  led  Jevons  to  use  the  geometric  mean  in  his  first  treatment 
of  index-numbers  ( Fall  in  the  Value  of  Gold). 

It  should  be  noticed  that  the  geometric  mean  gives  greater 
importance  to  small  numbers  and  less  to  large  than  does  the 
arithmetic. 


G.  General. — The  function  of  means  will  now  be  clear; 
it  is  to  express  a  complex  group  by  a  few  simple  numbers. 

The  function  of  The  mind  cannot  grasp  the  magnitudes  of  millions 
means-  of  items  at  once;  they  must  be  grouped,  simpli¬ 
fied,  averaged.  The  means  chosen  must  be  those  which 
will  give  the  striking  features  and  the  essential  characteristics 
of  the  group.  Different  methods  will  apply  to  groups  of 
various  classes;  each  must  be  taken  on  its  own  merits.  A 
good  and  suitable  mean  has  the  following  characteristics  : — 
If  there  is  a  type  it  shows  it ;  it  gives  due  influence  to  extreme 
cases  ;  it  is  not  easily  affected  by  errors  or  much  displaced  by 
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slight  alterations  in  systems  of  calculation ;  and  it  is  easily 
calculated. 

The  relative  positions  of  the  different  kinds  of  means 
dealt  with  gives  some  information  as  to  the  general  nature  of 
the  group  to  which  they  refer.  The  arithmetic  mean, 
median  and  mode,  are  coincident,  if  the  group  is  symmetrical. 
The  arithmetic  mean  is  probably  above  the  median,  if  we 
have  a  small  group  at  a  high  degree.  The  arithmetic  mean 
is  generally  below  the  median,  if  there  is  an  absence  of  high 
numbers,  and  a  concentration  a  little  above  the  mean. 
The  mode  will  be  badly  defined,  if  our  group  is  not  homo¬ 
geneous.  The  mode  will  probably  be  below  the  arithmetic 
mean,  if  there  is  admail,  group  at  a  high  degree.  The  mode 
is  well  marked,  if  the  distribution  is  uniform.  These  rules  are 
only  tentative  and  easily  nullified  by  exceptional  circumstances. 


CHAPTER  VI. 


Statistical  groups. 


MEASUREMENTS  OF  DISPERSION  AND  OF  SKEWNESS. 

APPLICATION  OF  AVERAGES. 

Measurements  of  Dispersion  and  of  Skewness. 

In  the  sections  of  Chapter  V  which  relate  to  "  means  ”  we 
have  been  concerned  principally  with  considering  the  central 
position  of  a  statistical  group,  where  by  the  term 
statistical  group  we  mean  a  number  of  persons  or 
things  possessing  certain  defined  attributes  (Enumerated,  in 
England  or  Wales,  in  1911,  male)  and  grouped  according  to  a 
variable  attribute  (age).  We  can  exhibit  such  a  group  either 
by  tabulation  in  grades  or  otherwise  (pp.  69-70)  or  by  a  diagram 
(p.  127),  but  for  purposes  of  brevity  or  for  comparison  with 
other  groups  we  need  to  define  and  calculate  measurements 
related  to  the  group  in  such  a  way  as  to  show  its  characteristics. 
For  this  purpose  it  is  convenient  to  choose  (i)  a  mean  which 
locates  a  central  position,  (ii)  a  measurement  of  the  dispersion, 
variation  or  scattering  of  the  observations,  and  (iii)  a  measure¬ 
ment  of  imperfect  symmetry.  We  proceed  to  the  discussion 
of  (ii)  and  (iii). 

The  differences  between  the  measurements  of  the  items  of 
the  group  and  a  mean  or  other  fixed  point  are  called  deviations. 

In  the  table  (p.  iii)  the  group  taken  contains  the 
death-rates  of  the  aggregate  of  large  towns  in  the 
52  weeks  of  the  year  1902.  These  are  arranged  in  order  of 
magnitude  down  column  1  and  up  column  2.  In  columns  3 
and  4  are  shown  the  deviations  from  the  quantity  173,  selected 
as  being  near  the  median,  172J.  It  was  shown  on  p.  104  that 
the  total  and  therefore  the  average  of  deviations  (all  taken 
positively)  is  least  when  they  are  measured  from  the  median ; 
to  obtain  such  deviations  we  must  add  \  to  each  entry  in  column 
3  and  subtract  J  from  each  entry  in  column  4,  i.e.  add  and 
subtract  13  to  or  from  the  totals.  The  total  of  the  positive 

no 
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deviations  from  the  median  is  then  447,  of  negative  is  3 88,  and 
of  the  52  deviations  (irrespective  of  their  sign)  is  835.  The 
average  of  these  deviations,  viz.  835  4-  52  =  16-06.  To 


Death-rates  Week  by  Week  in  1902  in  the  Aggregate  of 
Great  Towns  in  England  and  Wales. 


Weekly  Death-rates 
per  10,000  living  in 
Order  of  Magnitude. 

Col.  1.  Col.  2 

a.  b. 

For  Mean  Deviation 
from  Median. 

Col.  3.  Col.  4. 

Excess  of  Excess  of 
a  over  173.  173  over  b. 

For  Standard 
Deviation. 

Col.  5.  Col.  6. 

Squares  of  Differences 
from  173. 

I 

For  Mean  Difference. 

Col.  7.  Col.  8.  Col.  9. 

Difference  M  ,  • 
between  y  *  Product. 

a  and  b.  Phen 

244 

136 

7i 

37 

5,041 

1-369 

108 

51 

5,5o3 

233 

x39 

60 

34 

3,600 

1,156 

94 

49 

4,606 

226 

141 

53 

32 

2,809 

1,024 

85 

47 

3,995 

209 

J43 

36 

30 

1,296 

900 

66 

45 

2,970 

206 

144 

33 

29 

1,089 

841 

62 

43 

2,666 

201 

*45 

28 

28 

784 

784 

56 

4? 

2,296 

196 

149 

23 

24 

529 

576 

47 

39 

1,833 

196 

x5o 

23 

23 

529 

529 

46 

37 

1,702 

196 

151 

23 

22 

529 

484 

45 

35 

x,575 

191 

152 

l8 

21 

324 

441 

39 

33 

1,287 

183 

154 

IO 

*9 

IOO 

361 

29 

31 

899 

182 

155 

9 

18 

81 

324 

27 

29 

783 

182 

x59 

9 

*4 

81 

196 

23 

27 

621 

181 

160 

8 

13 

64 

169 

21 

25 

525 

179 

164 

6 

9 

36 

81 

15 

23 

345 

1 77 

165 

4 

8 

l6 

64 

12 

21 

252 

177 

166 

4 

7 

l6 

49 

1 1 

19 

209 

x77 

166 

4 

7 

l6 

49 

1 1 

17 

187 

176 

167 

3 

6 

9 

36 

9 

x5 

135 

176 

169 

0 

O 

4 

9 

l6 

7 

x3 

9r 

176 

169 

3 

4 

9 

l6 

7 

I  I 

77 

x  74 

169 

I 

4 

I 

l6 

5 

9 

45 

*74 

170 

I 

3 

I 

9 

4 

7 

28 

1 74 

170 

I 

3 

I 

9 

4 

5 

20 

*73 

172 

O 

1 

0 

I 

1 

3 

3 

*73 

172 

O 

1 

O 

I 

1 

I 

I 

9,029 

434 

4_  T  n 

401 

26,491 

32,659 

r  T3 

x3 

835 

Arithmetic  average  9029  -f-  52  =  t73'63  approx. 

Median  172J. 

Quartiles  159J,  181J. 

Mean  deviation  from  the  median  :  rj  =  835  -r  52  =  i6‘o6  approx.  ;  from  the  average,  rj  =  16‘n. 
Quartile  deviation,  or  probable  error:  r  —  Kt8iI  —  159D  =  11.  Half  the  cases  are  within 
170J  ±  11. 


Standard  deviation  :  <r  =  \Z{A  of  26,491  —  ’6j2}  =  22’s6. 
Mean  difference  :  g=  32,659  4-  £  of  52  X  51  =  24‘63. 


Coefficient  of  variation  =  1  -  a  =  i3'o. 

i73‘63 


obtain  the  sum  of  the  deviations  from  the  arithmetic  average 
(I73‘63)  we  must  add  -63  to  each  of  the  deviations  Mean 
from  173  of  the  28  quantities  less  than  174  and  deviation. 

subtract  -63  from  the  remaining  deviations ;  the  total  is  then 
837-52  and  the  average  16-11.  The  average  of  the  differences 
between  the  various  measurements  and  their  arithmetic  average 
(i6-ii  in  this  case)  is  called  the  mean  deviation  of  the  group; 
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and  often  denoted  by  the  letter  ^;  we  may  also  use  the  term 
mean  deviation  from  the  median  (i6*o6).  The  mean  deviation 
is  an  obvious  and  convenient  measurement  of  the  dispersion  of 
the  group,  and  where  the  observations  are  recorded  singly  and 
not  merged  in  grades  is  easy  to  calculate. 

We  can  obtain  the  arithmetic  average  from  columns  3  and  4 
at  once  by  the  consideration  that  the  average  excess  over  173  is 
the  total  of  column  3  (434)  less  the  total  of  column  4  (401)4-52 
—33 4-52  —  ‘^3  approx. ;  the  average  is  therefore  173*63  approx. 


In  the  mathematical  treatment  of  statistical  groups  it  is 
found  inconvenient  to  handle  these  absolute  deviations  since 
in  algebra  they  appear  some  as  positive  and  others  as  negative, 
and  when  the  theory  of  probability  is  applied  it  is  found  that 
the  importance  of  the  deviation  depends  on  its  square  and  not 
on  its  first  power.  Accordingly  the  average  of  the  squares  of 
the  deviations  from  the  arithmetical  average  of  the  group  is 
taken,  and  the  square  root  of  the  average  obtained  is  called  the 
standard  standard  deviationoi  the  group  ;  this  measurement 
deviation.  0f  dispersion  is  in  general  use  and  is  denoted  by  the 
letter  0 ^  It  can  be  calculated  by  writing  down  the  deviations 
exactly,  but  the  procedure  is  greatly  simplified  as  follows.  Let 
xv  x2  .  .  .  xn  be  the  measurements,  x0  the  central  quantity 
from  which  the  deviations  are  most  conveniently  measured. 
Write  dx  =  x±  —  x0>  d2  =  x2  —  x0,  etc.,  i.e.  for  the  deviations 
as  tabulated.  Let  x  be  the  arithmetic  average  of  the  group, 
so  that  n%  =  x1  +  *2  +  •  •  •  +  xn  i  and  let  d0  =  x  —  x0,  so 

that  nd0  =  (xx  —  x0)  +  (x2  —  x0)  +  •  .  .  =  dx  +  d2  +  .  .  .  +  <4. 

Then  by  definition 

=  {(*1  —  XY  +  (x2  —  x)2  +  .  .  .  +  [xn  —  x)2}  4-  n 

=  { (dx  d0)2  +  {d2  d0)2  +  .  .  .  }  4-  n 

=  [dj2  +  d2  +  .  .  .  —  2d0(d1  d2-\~  .  .  nd02}  4-  fi 

=  {^12-|-^22+  •  •  •  —  nd02}-r-n,  since  dx-\-d2-\-  .  .  .  ~nd0 


and  0  =  yyj - w^ere  *  W1'i^en  ^or 

dx2  -\-d 22  +  .  .  .  +  dy2. 


In  the  table  %  —  173*63,  x0  —  173,  d0  =  *63. 

dx2,  d2  .  .  .  are  given  in  columns  5  and  6,  and  .  S d2  =  26,491. 

.  * .  o  =  ^{26,491  4-  52  —  ’632}  =  22*56  approx. 

The  standard  deviation  is  always  measured  in  relation  to 
the  arithmetical  average,  not  to  the  median. 
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A  much  simpler  measurement  of  dispersion  is  obtained  by 
the  use  of  the  quartiles.  The  difference  between  the  quartiles 
is  evidently  related  to  the  dispersion,  though  it  has  the  weakness 
that  the  same  measurement  would  be  obtained  from  groups 
whose  quartiles  were  the  same,  however  the  observations 
between  the  quartiles  were  distributed,  and  however  far  the 
observations  outside  the  quartiles  were  placed.  It  is  therefore 
much  less  sensitive  than  the  mean  or  the  standard  deviations. 
The  measurement  used,  however,  is  not  the  whole  distance 
between  the  quartiles  but  half  that  distance,  and  Quartiie 
we  may  call  the  half-distance  between  the  quartiles  deviation, 
the  quartiie  deviation  ;  it  is  commonly  denoted  by  the  letter  ta 
r  is  approximately,  but  not  in  general  exactly,  equal  to  the 
median  of  the  deviations.  In  the  table  the  quartiles  are  159 J 
and  i8i£,  the  distance  between  them  is  22  and  .  • .  r  =  11. 

The  median  is  not  necessarily  half-way  between  the  quartiles. 
In  the  case  before  us  this  half-way  mark  is  J(i59i+i8i£)  =  i7oJ, 
and  the  quartiles  are  170 J +11.  We  may  then  describe  the 
group  very  simply,  as  follows  :  the  arithmetic  average  is  173-6 
(or  the  median  is  173)  and  half  the  observations  are  within  the 
range  170J  +  11. 

In  a  symmetrical  group  the  arithmetic  average  and  the  mode 
are  coincident,  and  r  is  called  the  probable  error,  a  term  that  is 
convenient  in  some  respects,  but  suggests  misleading  ideas. 

If  the  data  are  given  in  grades  a  modification  of  method  is 
necessary  and  the  measurements  can  only  be  approximate. 
Take  for  example  the  table  of  ages  on  p.  86.  The  median  and 
quartiles  have  already  been  found  (p.  107)  as  Gradeddata 
41-80,  33-45,  and  52-34  years.  The  quartiie 
deviation  is  therefore  £(52-34— 33*45) =9*45  years,  and  half 
the  cases  are  in  the  range  42-90  +  9-45  years.  The  deviations 
from  42!  years  are  given  in  column  2.  If  we  assume  all  the 
entries  in  each  grade  to  be  concentrated  at  the  middle  point 
of  the  grade,  column  4  shows  the  aggregate  deviations  in  each 
grade  and  the  sum  of  the  numbers  irrespective  of  sign,  viz. 
1155  +  926  =  2081  is  the  total  of  the  deviations.  The  mean 
deviation  from  42J  years  is  then  approximately  2081  4-  1000 
of  5  years  =  10-40  years.  A  small  correction  is  needed  to 
obtain  the  mean  deviation  from  the  median  or  from  the  average 
(43*6  .  .  .).  Rather  troublesome  additions  are  needed  to  allow 
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for  the  deviations  of  the  136  entries  in  the  zero  grade  and  to 
correct  for  the  supposed  concentration  at  the  middle  points. 
These  become  negligible  if  the  grading  is  sufficiently  fine  (entries 
for  every  year  would  be  sufficient  in  this  case),  and  it  is  only 
then  that  the  use  of  the  mean  deviation  for  graded  data  is 
recommended.  The  origin  should  be  taken  at  the  centre  of 
that  grade  which  contains  the  average  or  median,  whichever  is 
the  starting-point  for  measuring  the  deviations. 

No  new  principles  are  involved  in  the  calculation  of  the 
standard  deviation  in  such  cases,  when  the  grading  is  fine. 
Examples  are  given  on  Part  II,  Chap.  I  below. 

Professor  Corrado  Gini  has  introduced  a  new  measurement 
of  variation  ( Variabilita  e  Mutabilita  ;  Fascicolo  i° ;  Bologna, 
1912,  pp.  19  seq.).  He  contends  that  the  problem  that  arises 
in  the  study  of  the  variability  of  demographic  anthropological, 
biological  or  economic  characters  is  How  much  do  the  different 
magnitudes  differ  between  themselves  ?  and  not  How  much  do 
diverse  measurements  differ  from  their  arithmetic  mean  ?  The 
second  question  is  appropriate  in  physical  science,  but  not  in 
the  description  of  groups.  Accordingly  he  proposes  as  a 
measurement  the  arithmetic  mean  of  the  \n(n— 1)  differences 
Mean  that  are  to  be  found  between  n  quantities.  This 

difference.  we  may  call  the  ju&M  difference  and  denote  it  by 

the  letter  g.  It  has  not  yet  come  into  general  use,  possibly 
because  (except  in  the  simplest  cases)  the  arithmetic  involved  in 
its  calculation  is  indirect  and  rather  arduous ;  but  it  cannot  be 
denied  that  the  conception  is  simple  and  logical. 

Let  alt  a2  .  .  .  an  be  n  quantities,  arranged  in  ascending  order 
Then  gx  \n[n— 1)  = 

(dn — #i)T  fan  —  Cl  2)  ~ f~  •  •  •  ~\~(dn —  dn-2~\-(dn —  Cln-i) 

T  ( Cln  -  1  —  #1)  T  ( dn  -  1  —  d2)  T  .  .  .  T  (dn  -  j  —  dn  -  2) 


T  (d3  —  +  (d3  —  d2) 

+  (a2  —  al) 

=  (n  —  i)dn+(n  —  3 )dn-  1  +  (n  —  5 )dn - 2  +  .  .  . 

+  (1  —  n  )d1  +  (3  —  n)d2  +  (5  —  n)d3  +  .  .  . 

=  ~  ,<faX.+  f»-.3)(*-  +  («  -  5)fo»- 1  ~  «.)  +  ■  •  • 

The  computation  is  readily  performed  as  in  columns  7,  8,  9  of 
the  table,  p.  ill,  where  n  is  an  even  number.  If  n  is  odd  the 
central  number  occurs  by  itself  with  a  zero  multiplier. 
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The  relation  between  g  (the  mean  difference)  and  rj  (the  mean 
deviation)  can  be  exhibited  as  follows  : — 

Let  dly  d2  .  .  .  dn  be  the  differences,  all  taken  as  positive, 
between  the  median  and  aly  a2  .  .  .  an. 

Then  an  —  a-±  ==  di  d~  dn  \  cin-  \  —  a2  '=—  d2  d-  dn~\,  etc., 

and  g=~l  °f  -\2{d1J[-dn)-\-2  .  ^-^{d2-\-dn-^)Jr2  .  n—^(d3-\-dn-2)  d-  •  •  •  J » 


while  r)  —  — d~  ^ n  ~ l-  ^2  d~  dn  -  f.  d3  dn-\  d-  j* 

In  g  more  than  average  importance  is  given  to  the  extreme 
variations,  and  g  is  always  greater  than  r].  E.g.,  if  the  observations 

are  spaced  at  equal  intervals  (k),  it  can  be  shown  that  g  is  approxi¬ 
mately  rj  X  ;  for,  in  this  case,  if  n  =  2m  -f- 1,  it  is  found  that 


0/  .  m(m  d-i)r" 

g  =  f  (W  -f-  i)^,  7]  =  — — : — Lk, 
6  3  \  1  /  2W  d“  I 

approximately  \mk. 


g  H-  V  -  i(l  +  ~) 


also  g  — 


is 


If  the  instances  are  entered,  not  singly,  but  as  yt  cases  at  aly 
y2  cases  at  a2  ...  yt  cases  at  at,  where  yL  -f-  y2  d-  .  .  .  d -y<  =  N, 
the  working  is  more  complicated.  It  can  be  shown  that — • 
g  X  *N(N  -  1)  = 

y^(N  —  yt)  +  yt  -  A  -  i(N  —  2 yt—  yt  _  J 

d~  yt  -  2  dt  _  i(N  —  2yt  —  2yt .  1  —  yt  _  2)  d-  •  •  « 

d-  yA(N—  yj  -f  y2^2(N  —  2yj  —  y2)  d-  y3^(N  —  2yx  —  2y2  —  y3) 


+  •  •  . 

where  the  d’s  in  the  first  and  second  lines  are  the  differences  to 
quantities  above  and  below  the  median  respectively.  The  factors 
are  readily  computed  and  arranged  in  a  table.* 

When  measurements  are  distributed  according  to  the  normal  curve 


of  error  (Part  II,  Chap.  II)  we  have  the  following  relations : — rj= 


=  a-  X  ‘  798  .  .  .  ,  r  =  a-  X  *  6745,  g  =  rj  si 2  .  =  rj  x  1*414  ....  These 
relations  are  often  obtained  approximately  in  other  distributions. 
Thus  on  p.  hi,  rj  =  *70-,  g  =  77  X  1*41 ;  but  r  —  *50-  only. 

If,  following  Professor  Gini’s  idea,  we  take  the  square  root  of 
the  average  of  squares  of  all  the  differences,  we  obtain  (whatever 

the  distribution)  the  quantity  a\/2(nEi}'  or  a-  s] 2  very  nearly. 


So  far  all  the  measurements  of  dispersion  have  been  ex¬ 
pressed  as  concrete  quantities,  as  so  many  shillings,  years. 


*  The  working  of  the  formula  here  given  differs  in  an  unimportant  way 
from  that  used  by  Gini,  loc.  cit.y  p.  30  and  foot-note  on  p.  29. 
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points  on  a  scale,  etc.  It  is  sometimes  advantageous  to  express 
them  in  relation  to  a  mean.  Thus  if  the  median  and  quartiles 
of  a  wage  group  were  30s.,  40s.  and  50s.,  the  quartile  deviation 
is  J  of  the  median,  while  in  another  group,  say  35s.,  45s.,  55s., 
it  would  be  f;  it  is,  in  fact,  reasonable  to  regard  the  second 
group  as  being  less  dispersed  than  the  first,  though  their 
quartile  deviations  are  equal.  Possible  measurements  of  this 


class  are  (a) 


Quartile  deviation  * 


(b) 


Mean  deviation 
Median 


Skewness. 


Mean  of  quartiles 

(c)  Standard  deviation  ,  ,  ,,  ,  ,  ,  „ 

-  A  — , . - :  but  the  only  measurement  at  all 

Arithmetic  average  J 

coefficient  of  generally  used  is  the  standard  deviation  expressed 
variation.  as  a  percentage  of  the  arithmetic  average  (i.e. 

(c)  x  100)  and  this  is  called  the  coefficient  of...  vanaMm.  In 
the  table  on  p.  in  it  is  22-56  X  100  4-  173-63  =  13-0. 

Asymmetry  or  skewness  of  a  curve  is  indicated  when  the 
mode,  median  and  arithmetic  average  do  not  coincide.  It  is 
shown  more  definitely  when  the  sum  of  the  positive  deviations 
from  the  median  is  not  numerically  equal  to  the  sum  of  the 
negative  deviations;  it  is  also  shown  when  the  quartiles,  or 
pairs  of  deciles,  are  not  equidistant  from  the  median.  Any  of 
these  inequalities  could  be  made  into  a  measure¬ 
ment  of  skewness.  Skewness,  relating  to  the 
shape,  and  not  to  the  size,  of  a  curve  is  appropriately  measured 
by  an  absolute  quantity  (resembling  the  eccentricity  of  an 
ellipse),  and  we  therefore  need  a  ratio  of  two  concrete  measure¬ 
ments.  The  simplest  to  compute  is  as  follows :  let  q2  be  the 
excess  of  the  upper  quartile  over  the  median,  and  q1  the  excess 

of  the  median  over  the  lower  quartile ;  then  s  =  is  a 

measure  of  skewness. t  If  the  curve  is  symmetrical,  q2  =  qx 
and  s  =  0 ;  if  q2  >  qv  s  is  positive,  and  if  q2  <  qlt  s  is  negative, 
s  becomes  +  1,  if  qx  =  0,  that  is  if  the  median  and  lower 
quartile  coincide,  and  s  becomes  —  1,  if  q2  —  o.  s  is  therefore  a 
measurement  which  never  exceeds  1  numerically,  and  has  a 
definite  significance  at  zero  and  at  its  extreme  values.  In  the 

table  on  p.  111,  q2  =  9,  q1  =  13,  s  =  — -  =  —  *19.  In  the 

22 


*  In  earlier  editions  I  called  this  quantity,  the  dispersion.  It  has  the 
advantage  that  it  is  necessarily  not  greater  than  1. 
t  See  also  p.  251. 
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table  of  ages  (see  pp.  86  and  107)  q2  =  10-54,  #1  —  8*35,  s  =  -12. 
The  significance  of  various  values  can  only  be  obtained  by 
experience,  but  it  may  be  suggested  that  t  is  a  moderate 
degree  of  skewness,  and  -3  a  considerable  degree. 

It  should  be  noticed  that  the  three  characteristics  of  a 
group  can  be  measured  simply  from  the  quartiles  and  median ; 
the  median  for  the  central  position,  the  quartile  deviation  for  the 
dispersion,  and  the  measurement  just  discussed  for  the  skewness. 


Some  Examples  of  the  Application  of  Averages 


If  our  analysis  of  the  nature  and  use  of  averages  is  complete 
and  if  averages  are  of  widely  extended  use,  we  Application  of 
should  now  be  able  to  express  almost  any  group  averages 
of  figures  by  a  few  well-chosen  numbers  of  definite  significance. 

To  apply  a  somewhat  severe  test  at  first,  let  us  choose 
a  familiar  example  from  ordinary  life,  and  consider  how  a 
suburban  business  man  might  test  the  merits  of 
two  railway  systems,  by  one  of  which  he  intended 
to  take  a  season  ticket. 

The  following  table  gives  the  train  service  between  Leather- 
head  and  London  in  1898  : — 


Train  Service — Leatiierhead  to  London. 

Number  of  Minutes  to  Journey. 

Waterloo  — 

Down — 60,  50,  52,  48,  47,  61,  50,  44,  48,  53,  45,  42,  45,  49,  43,  48,  42,  43. 
Sundays— 50,  50,  47,  49,  50. 

Up—  51,  46,  5U  48»  43.  44,  48,  48,  64,  45,  48,  47,  45,  47,  46,  47. 

Sundays — 48,  48,  51,  51,  51. 

London  Bridge — 

Down-67,  65,  65,  61,  74,  51,  56,  66,  65,  53,  59,  41,  49,  44,  58,  57,  56,  67,  80. 

Sundays — 67,  52,  66,  68,  88,  65,  65,  68,  65. 

Up— 6 9,  57,  53,  58,  54,  4U  58,  52,  42,  40,  55,  67,  79,  98,  69,  66,  68,  64,  71. 
Sundays— 7 2,  71,  69,  70,  62,  81,  73,  73. 

Victoria — 

Dozen— 77,  65,  55,  76,  77,  SS,  48,  53,  46,  69,  89,  54,  82,  71,  90. 

Sundays— 92,  45,  81,  84,  78,  61,  85,  83,  85. 

Up— 87,  65,  69,  69,  47,  48,  51,  83,  101,  58,  62,  61,  76,  103. 

Sundays — 81,  76,  80,  85,  85,  82,  94. 


The  following  table  gives  us  the  necessary  information 


London 

Bridge. 

Victoria. 

Waterloo. 

Min. 

Min. 

Min. 

Average  of  four  quickest  trains  - 

41 

46* 

42^ 

Lower  decile  - 

A7\ 

48 

43 

Median  - 

65 

77 

48 

Mode  ....  - 

65 

. .  . 

4S 

Number  of  trains  on  week  days- 

38 

29 

34 

General  average 

63 

73 

48 
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It  is  to  be  noticed  that  the  statistical  method  is  generally 
limited  to  one  aspect  of  a  problem ;  the  question  of  punctuality 
might,  indeed,  be  easily  treated  statistically,  but  the  questions 
of  comfort  and  relative  picturesqueness  of  route  will  elude  our 
analysis. 

The  next  example  shows  a  method  of  throwing  into 
relief  the  characteristics  of  a  typical  group  of  sociological 
data. 

The  adjoining  table  gives  the  wages  recognised  by  the 
Tabulation  of  Amalgamated  Society  of  Engineers  in  many  of 
wages  returns,  their  branches  in  1862  and  1891. 


Amalgamated  Society  of  Engineers, — Wages  in  1862  and  1891, 

Weekly,  exclusive  of  Overtime. 


1862.  1891. 


s. 

d. 

s. 

d. 

Accrington 

27 

0 

3i 

0 

Ashford 

33 

6 

30 

0 

Ashton-under-Lyne 

29 

3 

34 

0 

Bacup 

26 

1 

28 

0 

Barrow-in-Furness 

3i 

0 

34 

9 

Bath 

29 

0 

3i 

0 

Bedford 

27 

0 

29 

0 

Bilston 

28 

0 

30 

0 

Bingley 

24 

0 

29 

c 

Birkenhead 

29 

0 

35 

6 

Birmingham 

32 

0 

36 

0 

Blackburn 

27 

6 

32 

0 

Bolton 

27 

6  1 

f  2$ 
t  32 

0 

0 

Bridgwater 

24 

6 

24 

0 

Brighton  - 

24 

8f 

29 

0 

Bristol 

3i 

0 

32 

0 

Burnley 

27 

0 

30 

0 

Burton-on-Trent 

25 

0 

30 

0 

Bury 

28 

3  j 

3o 

32 

0 

0 

Cardiff  - 

3i 

0 

34 

0 

Carlisle 

24 

6 

30 

0 

Chepstow  - 

30 

0 

34 

0 

Chester 

30 

0 

32 

0 

Chowbent  - 

26 

0 

32 

0 

Colne 

25 

0 

3i 

0 

Congleton 

24 

0 

28 

0 

Coventry  - 

28 

0 

34 

0 

Crewe 

29 

4 

30 

0 

Darlington 

25 

0 

31 

6 

Dartford  - 

34 

0 

33 

0 

Darwen  - 

27 

0 

32 

0 

Derby 

26 

0 

29 

0 

Doncaster  - 

28 

6 

31 

6 

Dover 

35 

6 

36 

0 

Enfield  Lock 

36 

0 

40 

6 

Exeter 

23 

0  \ 

28 

13* 

0 

0 

1862. 

1891. 

s. 

d. 

s. 

d. 

Faversham 

• 

34 

0 

33 

0 

Folkestone 

34 

O 

32 

0 

Frome 

24 

O 

f  27 

\  3° 

0 

0 

Gainsborough  - 

27 

6 

28 

0 

Glossop 

27 

2 

32 

0 

Gloucester 

28 

0 

32 

0 

Grantham  - 

28 

6 

30 

4 

Grimsby  - 

28 

0 

32 

0 

Halifax 

23 

1 

3i 

0 

Hanley 

28 

3 

32 

0 

Hartlepool 

26 

0 

34 

10 

Hey  wood  - 

27 

0 

/30 
l  34 

0 

0 

Holyhead  - 

32 

0 

28 

0 

Huddersfield 

26 

0 

26 

0 

Hull  - 

27 

6 

34 

0 

Hyde 

—  - 

f  30 

1  28 

0 

0 

30 

28 

0 

0 

Ipswich 

28 

6 

28 

0 

Keighley  - 

23 

0 

27 

0 

Kidderminster  - 

28 

0 

30 

0 

Lancaster  - 

25 

0 

32 

0 

Leeds 

25 

0 

30 

0 

Leicester  - 

26 

0 

3i 

6 

Leigh 

2  7 

9 

3i 

6 

Lincoln 

26 

7 

28 

6 

Liverpool  - 

29 

0 

34 

0 

Llanelly  - 

22 

0 

26 

0 

Macclesfield 

24 

0 

29 

6 

Manchester 

29 

9 

35 

0 

Mexborough 

27 

0 

32 

0 

Middlesborough 

25 

0 

34 

0 

Middleton - 

29 

5 

33 

0 

Milton  and  Elsecar 

28 

0 

34 

0 

Neath 

32 

0 

30 

0 

Newark 

25 

0 

29 

0 

Newcastle  •  * 

9 

25 

0 

J  35 
l  37 

0 

0 
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Amalgamated  Society  of  Engineers,— Wages  in  1862  and  1891, 

Weekly,  exclusive  of  Overtime  ( continued ). 


1862. 

1891. 

1862. 

1391. 

s. 

d. 

s. 

d. 

s. 

d. 

s. 

d. 

New  Holland  - 

30 

8 

34 

0 

Stafford 

34 

0 

30 

0 

Newport  - 

30 

0 

32 

0 

Stalybridge 

28 

3  • 

S2 

0 

New  Town  (Stockport) 
Newton  Abbott  - 

29 

33 

0 

0 

32 

33 

0 

0 

Stockport  - 

28 

0  i 

32 

34 

0 

0 

Northampton 

26 

0 

32 

0 

Stockton-on-Tees 

24 

0 

36 

0 

Northfleet  • 

36 

0 

56 

0 

Stoke-on-Trent  - 

29 

0 

a2 

0 

North  and  So.  Shields 

26 

0 

35 

0 

Stroud  and  Thrupp 

26 

0 

30 

0 

Norwich  - 

32 

6 

29 

0 

Swindon  - 

31 

6 

3* 

6 

Nottingham 

27 

5 

34 

0 

Todmorden 

26 

0 

28 

0 

Oldbury  - 

23 

0 

34 

0 

Wakefield  - 

25 

0 

30 

0 

Oldham 

29 

0 

33 

0 

Warrington  - 

28 

0 

34 

0 

Peterborough 

28 

6 

33 

0 

Watford  - 

35 

0 

36 

0 

Plymouth  - 

32 

6 

33 

0 

Wednesbury 

26 

0 

3i 

0 

Pontypridd 

Portsmouth 

24 

35 

0 

0 

30 

31 

0 

0 

Whitehaven 

25 

0 

* — , 

tsJ 

C"'  00 

0 

0 

Preston 

27 

0 

32 

0 

Wigan 

28 

0 

34 

0 

Radclifle  Bridge 

27 

0 

{3° 
l  32 

0 

0 

Wolverhampton 

Wolverton 

78 

29 

0 

2 

33 

29 

0 

0 

Reading  « 

28 

0 

/32 
l  34 

0 

0 

Worcester  - 
Bermondsey 

3i 

35 

O 

4  > 

30 

0 

Ripley 

26 

0 

26 

6 

Blackwall  - 

34 

0 

Rotherham 

27 

6 

32 

0 

Bow  - 

36 

0 

Rugby 

32 

0 

(  28 
\32 

0 

0 

Greenwich 

King’s  Cross 

34 

36 

0 

0 

Rugeley  - 

24 

1 1 

30 

0 

Lambeth  - 

35 

8 

St  Helens  - 

28 

0 

{34 

0 

London,  E. 

35 

0 

•3S 

136 

0 

,,  N. 

35 

10 

0 

Sheffield  - 

28 

0 

36 

0 

Q 

,,  0. 

35 

0 

Shipley 

25 

9 

I28 

l3o 

0 

0 

w 

H  '  v  • 

Marylebone 

35 

33 

6 

0 

Shrewsbury 

30 

6 

32 

0 

Stratford  - 

f  35 

0 

Smethwick 

28 

0 

35 

0 

l  33 

6 

Southampton 

32 

0 

34 

6 

Tower  Hamlets 

36 

6 

Sowerby  Bridge 

24 

6 

30 

0 

Woolwich  - 

36 

0  ^ 

The  following  figures  show  the  same  in  brief 


1. 

1862.* 

2. 

1891.* 

3. 

iSgx.t 

s.  d. 

s.  d. 

s.  d. 

Maximum 

36  6 

40  6 

•  •  • 

Upper  decile  -  -  - 

35  0 

38  0 

38  0 

Upper  quartile  - 

3i  4 

34  0 

36  0 

Median . 

28  0 

32  0 

34  3 

Arithmetic  average  - 

28  10 

32  4 

33  4 

Modes  ----- 

28  0 

f  30  0 
\  32  0 

•  •  • 

Lower  quartile  - 

26  0 

30  0 

31  6 

Lower  decile  - 

24  6 

28  6 

30  0 

Minimum  ...  - 

22  0 

24  0 

•  •  • 

Quartile  deviation  -  -  - 

2  8 

2  0 

2  3 

Skewness,  from  quartiles  - 

•25 

0 

—  .22 

*  Each  branch  counting  as  1. 

t  The  numbers  of  members  in  each  branch  counted  as  receiving  the 
wage  recognised  there, 
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If  the  rates  at  each  branch  were  not  those  actually  paid  to 
all  members,  but  their  average,  while  the  actual  wages  were 
confined  within  small  limits  of  that  average,  the  figures  in  the 
last  column  would  be  little  affected. 

On -comparing  columns  i  and  2  it  will  be  seen  that  not 
only  have  all  the  averages  increased,  but  that  since  the  lower 
decile  and  quartile  have  increased  more  rapidly  than  the  upper, 
the  lower  half  has  also  gained  on  the  upper.  Again  the  wages 
are  grouped  more  closely  in  column  2  than  in  column  1. 


Group  C  of  Tabulation. — It  was  necessary  to  postpone 
the  tabulation  of  non-numerical  or  descriptive  answers  till  we 
Tabulation  of  had  finished  our  discussion  of  averages.  The  fol- 
descriptive  lowing  detailed  example  shows  how  the  median, 
etc.,  can  be  used  to  give  a  short  description  of  a 
large  group  of  adjectival  answers. 

In  1891  the  Amalgamated  Society  of  Engineers  obtained 
from  all  their  branches  answers  to  the  question  :  To  what 
extent  is  overtime  worked  ?  The  branch  secretaries  sent 
answers  which  may  be  tabulated  as  on  next  page. 

An  inspection  of  the  table  here  given  will  show  sufficiently 
the  method  of  tabulation.  The  position  of  most  of  the  answers 
Explanation  of  in  an  imaginary  scale  is  fairly  definite,  except  that 
table.  ^  is  not  always  obvious  where  the  numerical 
answers  should  be  placed;  this  must  be  decided  either  by 
internal  evidence  or  practical  knowledge  of  the  trade.  The 
same  adjectives  did  not  of  course  convey  exactly  the  same 
numerical  meaning  to  all  the  branch  secretaries  who  used  them, 
but  it  will  be  admitted  that  this  tabulation  gives  a  fairly  clear 
view  of  the  case,  and  that  the  method  of  medians  and  quartiles 
may  be  appropriately  applied.  Taking  the  member  of  a 
branch  as  the  unit  and  neglecting  the  unclassed  answers,  the 
median  is  “  Maximum  18  hours  in  4  weeks  ”  or  “  moderately,” 
the  lower  quartile  “  Very  little,”  and  the  upper  quartile  “  14 
hours  when  busy.”  Taking  the  branch  as  unit,  the  median  is 
“  Not  much,”  the  quartiles  are  “  Very  little  ”  and  “  When 
necessary  ”  or  “  Occasionally.” 

This  method,  which,  with  varying  degrees  of  precision,  is 
widely  applicable,  seems  to  afford  the  only  way  of  comparing 
two  such  groups  of  answers.  The  precision  attainable  is  to  be 
measured  by  the  distance  through  which  the  median  can  be 
shifted  by  making  reasonable  variations  in  the  scheme  of 
tabulation. 


Answers. 

Number  of 
Branches. 

Number  of 
Members. 

None  - 

4 

140 

Not  worked  - 

1 

78 

Very  little 

23 

4,836 

To  very  limited  extent 

1 

63 

Very  occasionally 

1 

35° 

A  little  on  repairs 

I 

5°° 

Little  - 

2 

73 

2  hours  when  necessary 

I 

80 

Seldom 

1 

59 

Small  extent  - 

I 

16 

Seldom  except  on  repairs 

I 

66 

Only  on  repairs 

- 

2 

216 

Not  much 

6 

1,125 

On  repairs 

1 

500 

Not  to  any  extent 

3 

644 

Not  to  a  great  extent  - 

2 

162 

Not  general  - 

1 

7 

Not  systematically 

2 

43 

In  cases  of  breakdown  or  emergency 

7 

606 

2  hours  regularly 

- 

1 

136 

Chiefly  on  repairs 

- 

1 

20 

Occasionally  - 

m 

2 

90 

When  necessary 

- 

1 

348 

Casually  (sic)  - 

m 

2 

142 

A  good  deal  on  repairs 

- 

1 

23 

Maximum  18  hours  in  4  weeks 

1 

1,000 

Moderately 

- 

3 

262 

Systematically  in  good  trade 

- 

1 

200 

Aveiage  about  5  hours  a  week 

1 

96 

Considerably  in  marine  shops 

1 

400 

Systematically  in  dockyard 

m 

- 

1 

650 

General 

- 

2 

146 

Systematically 

- 

1 

693 

Great  amount- 

- 

1 

263 

To  a  great  extent 

»■ 

1 

72 

Excessively 

1 

55o 

9  hours  a  week 

- 

1 

39 

10 

- 

1 

106 

12  ,,  (maximum) 

- 

1 

700 

14  ,,  (when  busy) 

- 

1 

106 

10  to  18  hours  a  week  - 

1 

5,000 

Total 

• 

88 

20,666 

nclassed  : — 

No  answers 

• 

36 

5,IT4 

As  little  as  possible 

■> 

1 

250 

Not  so  much  lately 

* 

1 

160 

In  machine  shops  for  six  months 

1 

60 

In  steel  works  - 

0 

1 

348 
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Now  that  we  have  the  method  of  averages  at  our  disposal 
„  .  we  may  use  it  for  tabulating  and  summarising  a 

Summarisation.  J  ^ 

group  of  figures. 

Consider,  for  example,  the  answers  to  the  questions  issued 
by  the  Commissioners  on  Trade  Depression  in  1886. 

Four  of  the  questions  were  : — 

1.  Number  of  men  in  Society. 

2.  Number  out  of  work  in  1885. 

3.  Weekly  wage  in  1885. 

4.  Change  in  wages  between  1865  and  1885. 


The  following  table  shows  the  answers  given  by  the  branch 
secretaries  of  the  Amalgamated  Society  of  Engineers  : — 


I. 

District. 

2. 

No.  in 
District, 
1885. 

3- 

No.  Out 
of  Work, 
1885. 

4- 

Current 

Wages, 

1885. 

5- 

Change  between  1865  and  1885. 

Belfast  - 

1,100 

130 

28/ to  36/ 

Slight  increase. 

Coventry 

m 

2,500 

230 

31/6 

Contract  work — 50  %  de¬ 
crease. 

Dukinfield 

• 

170  + 

20  + 

31/ 

Slight  increase. 

Dundee 

• 

1,400 

45% 

25/  skilled. 

1 5/  unskilled. 

Time  work — 1865,  22/ ;  ’72, 
24/  ;  ’80,  26/  ;  ’83,  24/  ; 

’85,  25/. 

Glasgow 

• 

28,000 

4,000 

26/ 

Time  wages,  5  %  above 
1864. 

Glasgow  (St  Rollox) 

1,600 

250 

•  •  • 

Rise  in  1872-73  of  15  % ; 
1885  same  as  1865. 

Hartlepool  - 

- 

1,200 

400 

31/6 

Advance  of  3/. 

Glossop 

- 

135 

10 

32/ 

Liverpool 

• 

280 

38 

... 

Rise  in  1872-73  of  74  %  ; 
1885  same  as  1865. 

Monifieth 

1 14 

18 

21/ 

Skilled  work — 1865,  24/  ; 
’76,  27/;  ’78,  25/ ;  ’83, 
28/  ;  ’85,  25/. 

Nottingham  - 

4,000 

600 

34/minimum. 

1865,  28/;  1885,  34/. 

Oldham 

1,600 

96 

33/  average. 

Increase  of  5  °/0. 

Oxford  - 

45 

... 

33/ 

Paisley  • 

800 

*  •  « 

2  8/6 

1865,  26/  ;  1885,  28/6. 

Preston 

630 

40 

28/ 

None. 

Preston 

900 

120 

28/ 

None. 

Shipley 

201 

15 

28/6 

24/  non-unionists. 

1865,  28/6;  1869-73,  32/; 
1S85,  28/6. 

Sowerby  Bridge 

1,120 

43 

28/ 

1865-75,  25/6;  1875-85,  28/. 

Sunderland  - 

3,200 

400 

33/ 

1S64,  27/;  ’74,  34/;  1875- 
85,  between  31/  and  37/. 

Swindon 

6,050 

2 

31/6 

Ulverston 

45 

•  •  • 

31/ 

1865,  26/;  1875,  3i/- 

Wednesbury  - 

400 

30 

30/ 

Increase  of  2/. 

Workington  - 

170 

70 

28  to  36/ 

Increase  of  30  °/„. 
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It  is  suggested  that  the  following  are  the  summary  tables 
which  should  be  inserted  in  a  report  dealing  with  the  answers. 

The  figures  are  given  here  for  only  one  society,  but  the 
tabulations  are  framed  so  as  to  include  all. 


TABLE  I. — State  of  Employment. 


Name  of  Society. 

Total  Number* 
in  Branches 
making  Returns 
on  Employment. 

Number  Out  of 
Work. 

Percentage  Out 
of  Work. 

Median  of  the 
Percentages  Out 
of  Work  in  the 
Various  Branches. 

A.S.E. 

O.S.B. 

&c. 

55.170 

7,142 

13 

12 

*  Details  of  some  of  the  most  important  branches  should  be  added. 

TABLE  II. — Current  Wages. 


Name  of  Society. 

Average  of  Wages  in  Branches. 

Quartiles  of 
Branch  Wages. 

Measure  of  Dis¬ 
persion. 

(z/.  p.  i6  ( a )). 

Unweighted. 

Weighted. 

A.S.E. 

O.S.B. 

&C. 

t.  d. 

s.  d. 

s.  d.  s.  d. 

30  O 

29  7 

28  0  32  0 

A 

TABLE  III. 

A.  Change  of  Wage  between  1865  and  1885. 


Name 

of 

Society. 

Number  of  Brandies  showing 

Median 
of  Per¬ 
centage 
Increases. 

Percentages  of  Members  in  Branches 
showing 

No 

Answer. 

De¬ 

crease. 

_  No 
Change. 

Increase. 

No 

Answer. 

De¬ 

crease. 

No 

Change. 

Increase. 

A.S.E. 

O.S.B. 

&c. 

4 

I 

5 

13 

10 

11 

4 

6 

79 

Verbal  Summary. — In  the  great  majority  of  cases  a  con¬ 
siderable  increase  of  wage  took  place  between  1865  and  1885, 
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equivalent  on  the  whole  to  a  rise  of  about  io  per  cent.  The 
figures  are  not  sufficiently  definite  to  give  an  exact  average. 

Table  III. — B.  Change  of  Wage  between  1865  and  the 

Maximum  about  1873. 

Table  III. — C.  Change  of  Wage  between  Maximum  about 

1873  AND  1885. 

(Tabulation  as  in  III.  A.) 


CHAPTER  VII. 


THE  GRAPHIC  METHOD. 
i.  General  Purpose. 

The  two  main  methods  of  elementary  statistics  which  ought 
to  be  understood  by  all  students  or  officials  who  handle  figures, 
which  are  easily  within  the  grasp  of  all  independently  of  mathe¬ 
matical  training,  but  are  generally  misunderstood  or  ignored  by 
the  uninterested  or  the  uninitiated,  are  the  method  of  averages 
and  the  method  of  diagrams  or  the  graphic  method.  These  two 
are  placed  together  because  the  uses  of  averages  and  diagrams 
are  nearly  related.  When  we  deal  with  large  and  complex 
masses  of  figures  we  are  unable  to  grasp  them  in  Averages  and 
their  entirety,  however  clearly  they  may  be  tabu-  diagrams, 
lated.  Any  list  of  figures — the  populations  of  different  towns, 
the  death-rates  at  successive  ages,  the  wages  of  many  work¬ 
people,  the  imports  for  a  series  of  years — becomes  less  compre¬ 
hensible  as  its  length  increases.  A  series  of  ten  numbers  can, 
perhaps,  be  easily  grasped,  of  twenty  only  with  an  effort ;  while 
a  printed  list  of  figures  for  one  hundred  successive  years  leaves 
hardly  any  impression  on  our  mind  at  all ;  we  cannot  see  the 
wood  for  the  trees.  The  test  to  which  all  questions  as  to  the 
use  of  averages  should  be  referred  is  that  the  averages  selected 
should  afford  the  best  summary  of  the  whole  group  in  question 
that  the  mind  can  grasp.  When  the  meaning  of  the  word 
average  was  sufficiently  extended,  we  found  that  we  could  select 
three,  four,  or  even  ten  suitable  figures  which  adequately  showed 
the  main  features  of  any  group.  The  main  use  of  diagrams  is 
also  to  present  large  groups  of  figures  so  that  they  shall  be 
intelligible  in  their  entirety,  and  the  test  for  all  diagrams  is  that 
the  diagram  as  drawn  should  afford  the  best  view  of  the  series 
or  group  of  figures  that  the  eye  can  appreciate.  Diagrams  have 
one  use  which  averages  have  not,  for  it  is  only  by  a  diagram  that 
a  series  of  figures  relating  to  successive  years  can  be  adequately 

^5 
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presented ;  but  in  reality  they  are  less  essential  than  averages,  for 
the  latter  often  have  an  existence  independently  of  the  figures 
from  which  they  are  derived,  representing  true  types  of  the 
quantities  which  are  being  measured ;  and  by  their  use  alone 
are  further  comparisons  of  complex  groups  made  possible  :  while 
diagrams,  on  the  other  hand,  might  be  dispensed  with,  being 
auxiliary  rather  than  essential,  merely  an  aid  to  the  eye  and 
a  means  of  saving  time. 

To  connect  this  chapter  more  closely  with  the  preceding,  we 

Graphic  will  show  how  the  same  group  of  figures,  for 
representation  example  the  wages  of  a  large  group  of  workpeople, 
and  averages  may  ke  represented  by  either  method. 

Consider  the  following  data  : — 


Numbers  of  workpeople  eaming- 


From  15/  to  16/ 

200' 

From  25/  to  26/ 

-  1,200' 

»  16/  „  17/  - 

400 

y y 

26/  „  27/ 

800 

„  17/  „  18/  - 

100 

>1,000 

y ) 

27/  „  28/ 

700 

„  18/  „  19/  - 

100 

y } 

28/  „  29/ 

-  500 

19/ „  20/ 

200v 

y y 

29/ ..  30/ 

300 

„  20/  „  21/ 

200' 

>  y 

30/  „  31/ 

-  300' 

„  21/  „  22/  - 

300 

y  y 

31/ ..  32/ 

400 

„  22/  „  23/  - 

300 

•2,200 

yy 

32/  „  33/ 

400 

„  23/  „  24/  - 

500 

y  y 

33/  „  34/ 

-  500 

»  24/  „  25/  - 

900 J 

y  y 

34/ 35/ 

-  500 

3,5oo 


2,100 


From  35/  to  36/ 
»  36/  „  37/ 

„  37/  „  38/ 
„  38/  „  39/ 

»  39/  „  40/ 


600 

400 

100 

80 


-1,200 


20 


Using  the  method  of  averages  we  should  replace  this  group 
by  the  following  figures  : — 

S  •  (%• 

Average  of  all  - . 27  6 

,,  lowest  1,000  -  -  -  -  17  o 

,,  highest  1,000  -  -  -  -  36  6 

,,  middle  4,000  -  -  -  -  27  o 

or 

Median,  26/9;  quartiles,  24/2,  32/. 

Deciles,  20/,  23/6,  24/9,  25/8,  26/9,  28/2,  31/,  33/4,  35/4. 

Mode,  25/3 ;  secondary  positions,  16/6,  36/. 
or 

Persons  earning  from  15/  to  20/  20/  to  25/  25/  to  30/  30/  to  35/  35/  to  40 

Percentages  of  all  -  10  22  35  21  12 
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This  group  is  represented  on  the  annexed  diagram,  an 
example  of  the  graphic  representation  of  the  relation  between 
two  variable  quantities.  A  figure  similar  to  this  construction 
may  be  used  to  show  marriage,  or  death-rates  at  of  simple 
different  ages,  numbers  of  persons  of  various 
statures,  demand  at  different  prices,  or  any  such  group  of 
homogeneous  quantities.  The  same  construction  can  be 
used  to  show  the  changing  values  of  any  number  in  a  series 
of  years.  Draw  a  line  parallel  to  the  bottom  of  the  page,  and 
mark  equal  intervals  to  represent  a  quantity  which  can  have 
many  successive  small  increments,  such  as  age,  income,  height, 
price,  time,  and  so  on.  This  is  called  the  axis  of  abscissa, 
and  the  distance  of  a  point  measured  from  the  zero  position 
along  the  line  is  called  its  abscissa.  At  right  angles  to  this 
line,  parallel  to  the  side  of  the  paper,  through  the  zero  position 
we  draw  another,  called  the  axis  of  ordinates,  and  grade  this 
to  correspond  to  the  numbers  possessing  the  qualities  repre¬ 
sented  by  the  abscissae ;  at  each  grade  on  the  axis  of  abscissae, 
draw  lines  at  right  angles  to  it,  to  represent  on  the  chosen  scale 
the  numbers  at  that  grade ;  these  lines  are  called  the  ordinates. 
In  the  annexed  diagram  the  abscissae  represent  the  amounts 
of  wages,  the  ordinates  the  number  of  persons  earning  them. 
Join  the  tops  of  the  ordinates  by  straight  lines  and  the  diagram 
is  complete.  In  practice,  when  squared  paper  is  used,  without 
drawing  the  ordinates  their  tops  can  be  marked. 

This  diagram  shows  at  one  glance  the  distribution  of  the 
wage-earners  according  to  their  wages.  A  small  number  earned 
between  155.  and  16s.,  a  slightly  larger  group 
between  16s.  and  17s.,  very  few  between  17s.  and 
19s.  Above  19s.  the  number  continually  rises; 
high  numbers  are  found  from  24s.  to  27s.,  the  highest  between 
25s.  and  26s.  The  line  falls  to  the  30s.  group,  but  not  so  low 
as  between  17s.  and  19s.,  then  it  rises  regularly  to  36s.,  and 
falls  rapidly  to  39s.  Here,  then,  we  have  the  main  group 
congregated  in  the  neighbourhood  of  25s.,  a  distinct  but  smaller 
group  at  36s.,  and  a  small  and  nearly  isolated  group  at  16s. ; 
representing  a  considerable  group  of  highly-skilled  men  between 
30s.  and  40s.,  the  great  mass  with  ordinary  skill  between  20s. 
and  30s.,  and  a  small  group  of  incompetents  at  165.  These 
features  would  not  be  so  easily  seen  from  the  tabulated 
figures. 


Description 
of  the  wage 
diagram. 
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It  is  to  be  noticed  that  the  number  tabulated  as  between 
15s.  and  16s.  is  represented  by  the  ordinate' at  15s.  6 d.,  the 
middle  of  the  interval ;  if  the  original  figures  on  which  the 
table  was  based  had  been  given  to  the  nearest  id.,  the  ordinate 
should  be  drawn  at  15s.  5 \d.  It  is  important  that  these  middle 
points  should  be  accurately  placed. 

The  use  of  the  line  joining  the  tops  of  the  ordinates  is  two¬ 
fold.  First,  it  enables  the  eye  to  judge  relative  heights  more 
continuity  easdy ;  and  secondly,  it  suggests  the  idea  of  con¬ 
tinuity,  which  can  be  better  illustrated  by  the  next 
diagram.  In  this  the  abscissae  represent  ages,  the  ordinates 
the  estimated  numbers  of  persons  living  at  and  above  the  ages 
at  which  they  stand  per  thousand  inhabitants  of  England  and 
Wales  at  the  middle  of  the  year  1891.  The  ordinates  were 
drawn  at  the  points  on  the  axis  of  abscissae  representing  the 
middle  of  each  year  of  age ;  but  length  of  life  cannot  be  ex¬ 
pressed  exactly  in  years,  or  even  in  months,  days,  or  minutes. 
The  intention  of  the  diagram  is  to  show  the  proportion  living 
above  each  age,  and  for  this  purpose  the  joining  line  should 
have  no  breaks  or  sharp  angles,  but  should  suggest  absolute 
continuity. 

In  practice,  it  is  useless  to  mark  in  the  points  for  smaller 
intervals  than  a  year,  for  the  eye  could  not  grasp  the  detail. 
It  is,  however,  implied  that  the  line  drawn  has  the  same  shape 
as  that  which  would  result  if  the  number  of  persons  was  infinite 
and  the  subdivision  by  age  infinitesimal. 


Estimated  number  per  1,000  of  the  population  at  and  above — • 


Ages. 

Ages. 

Ages. 

Ages. 

Ages. 

O 

1,000 

16 

628 

32 

346 

49 

152 

65 

47 

I 

973 

1 7 

607 

33 

332 

50 

M3 

66 

43 

2 

949 

18 

5S7 

34 

318 

5i 

135 

6  7 

38 

3 

925 

19 

567 

35 

305 

52 

127 

68 

34 

4 

901 

20 

547 

36 

292 

53 

119 

69 

3i 

5 

8  77 

21 

528 

37 

280 

54 

112 

70 

27 

6 

854 

22 

5io 

38 

268 

55 

104 

7 1 

24 

7 

830 

23 

491 

39 

256 

56 

98 

72 

21 

8 

807 

24 

474 

40 

244 

57 

9i 

73 

18 

9 

783 

25 

456 

4i 

233 

58 

85 

74 

15 

10 

760 

26 

439 

42 

222 

59 

79 

75 

13 

11 

73S 

27 

423 

43 

211 

60 

73 

76 

11 

12 

7i5 

28 

4°7 

44 

201 

61 

67 

77 

9 

13 

693 

29 

39i 

45 

191 

62 

62 

78 

8 

14 

671 

30 

376 

46 

181 

63 

57 

79 

6 

15 

649 

3i 

361 

47 

48 

171 

161 

64 

52 

80 

5 

Calculated  from  the  Census  of  1891. 
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Numbers  per  1,000  of  the  Population  above  Assigned  Ages. 

Numbers. 

1,000 

800 

600 

400 

200 

o 

Ages.  10  20  30  40  50  60  70  80 

Apply  these  remarks  to  the  diagram  facing  p.  127.  Average 
earnings  for  a  year  will  not  be  reckoned  exactly  by  shillings 
or  even  pence ;  if  we  had  a  sufficient  number  of  instances  we 
should  get  regular  sequences  of  earners  at  successive  farthings, 
and  the  line  representing  them  would  have  no  sharp  angles, 
but  be  continually  curved.  The  figure  rightly  gives  the  eye 
this  impression  of  continuousness.  Similarly  in  the  diagram 
representing  exports  facing  p.  134,  the  line  correctly  gives  the 
impression  that  exports  are  continuous  day  by  day. 

By  an  obvious  step  we  may  suppose  that  the  unit  of  area, 
that  contained  between  vertical  lines  through  two  consecutive 
divisions  on  the  axis  of  abscissa,  and  horizontal 
lines  through  two  consecutive  divisions  on  the  axis 
of  ordinates,  represents  one  wage-earner,  and  it  is  then  easy 
to  see  that  the  area  contained  between  the  base  line,  the  curve, 
and  two  vertical  lines  through  the  points  marking  any  two 
amounts  of  wage  represents  the  total  number  earning  rates 
between  those  amounts. 

Hence  the  lines  (diagram,  p.  127)  through  M,  the  position  of 
the  median,  Qx,  Q3  those  of  the  quartiles,  D4,  D2,  D3,  D4,  M,  D6, 
D7,  D8,  D9  of  the  deciles  divide  the  area  ABm1w2w3CD  into  two, 
four,  and  ten  equal  areas  respectively.  The  centre  of  gravity 
of  this  figure  lies  on  the  vertical  line  through  V,  the  average 
wage ;  and  the  feet  of  the  ordinates  through  the  highest  points 
mv  m2,  m3  are  at  the  modes. 
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Graded  data. 


When  the  grades  in  which  the  data  are  tabulated  are  wide 
it  is  better  to  use  the  method  of  the  next  diagram,  which  we 
may  call  a  block  diagram. 

This  and  the  drawing  underneath  it  illustrate  the  numbers 
of  married  men  distributed  by  age  which  are  given  on  p.  86. 

In  that  table  we  have  no  information  except 
that  such  a  proportion  are  as  old  as  twenty  years 
and  not  as  old  as  twenty-five  years,  etc.  This  is  precisely 
represented  by  constructing  a  rectangle  with  base  the  interval 
that  represents  five  years,  and  height  proportional  to  the 
number  recorded  within  that  interval.  The  method  of  the 
diagram  facing  p.  127  would  suggest  that  all  were  at  the 
middle  of  the  grade.  In  the  case  of  ages  we  know  that  the 
succession  of  numbers  year  by  year  ought  to  be  continuous, 
and  a  complete  representation  would  be  a  continuous  curve, 
such  that  the  area  standing  on  a  five  years’  interval  equals 
the  area  of  the  corresponding  rectangle.  Such  a  curve  is  drawn 
free-hand  on  the  diagram.  If  the  figure  is  such  as  to  leave 
little  margin  of  uncertainty  as  to  the  position  of  the  curve 
throughout,  then  the  curve  is  an  adequate  representation  of 
the  facts. 

The  data  may  also  be  represented  by  the  lower  diagram, 
where  the  crosses  show  the  information  as  recorded  in  the 
table.  These  crosses  are  joined  by  straight  lines ;  the  resulting 
figure  may,  if  the  phenomena  are  continuous,  be  replaced  by 
a  curve,  which  in  this  case  would  hardly  be  distinguishable 
from  the  straight  lines. 

The  details  of  technique  of  diagram  drawing,  the  position 
of  the  scales,  the  devices  for  making  the  figure  clear,  and  so 

Requisite  on,  can  be  gathered  from  the  various  diagrams 

accuracy.  given  in  this  chapter.  The  degree  of  accuracy  to 
which  the  figures  should  be  marked,  whether  correct  to  a 
million,  a  thousand,  or  a  unit,  is  determined  simply  by  the 
power  of  the  eye  to  grasp  detail ;  in  most  of  those  here  given 
it  will  be  found  that  a  displacement  of  one  in  a  thousand  is 
perceptible,  and  this  is  the  ordinary  limit.  More  minute 
accuracy  is  useless,  for  it  is  not  the  function  of  diagrams  to 
dispense  with  lists  of  numbers,  but  only  to  enable  the  eye  to 
perceive  their  significant  features. 

Before  discussing  the  choice  of  scales  on  which  the  numbers 
are  to  be  represented,  it  is  necessary  to  consider  the  ways  in 


DISTRIBUTION  BY  AGE  OF  MARRIED  MEN,  ENGLAND  AND 

WALES,  1 9i  i. 

L  Block  Diagram.  Number  per  i.ooo  in  5-Year  Grades. 


AGES 


II.  Cumulative  Diagram  Numbers  per  1,000  above  the  Age  shown. 


X 

- 

1  ■  ■  ■  - 

-  «  1 -  » 

500 


AGES 
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which  a  diagram  makes  an  impression  on  the  eye.  The  eye 
can  judge— (1)  Distances;  (2)  ratios;  (3)  angles. 

The  dotted  lines  in  the  diagram  facing  p.  134  will 
illustrate  these  points,  (i)  The  eye  is  a  fairly  safe  judge  of 
distances ;  there  is  very  little  doubt  which  of  two  points  is  the 
further  from  the  base  line ;  when  squared  paper  is  used,  a 
difference  of  i  in  1000  is  perceptible.  The  eye  can  also  judge 
differences  quickly.  In  the  figure  the  value  of  the  exports  in 
1883  exceeded  that  in  1885  by  more  than  the  value  in  1890 
exceeded  that  in  1883.  (2)  It  can  be  seen  that  the  value  of 

exports  approximately  doubled  between  1862  and  1889;  or 
that  the  value  in  1878  is  about  three-quarters  of  that  in  1890. 
The  accuracy  with  which  the  eye  can  make  such  measurements 
is  not  great ;  it  is  not  easy  to  detect  that  the  ratio  of  the  values 
in  1873  and  1871  (1*095  :  1)  is  greater  than  the  ratio  of  the 
values  in  1882  and  1880  (1*073  :  1) ;  but  the  general  impression 
given  by  the  diagram  is  partly  made  up  by  unconscious  calcula¬ 
tions  of  this  nature.  To  make  these  observations  accurately 
the  method  described  on  pp.  169  seq.  should  be  used.  Notice 
that  for  these  observations  the  insertion  of  the  base  line  is 
necessary;  and,  because  they  are  made  unconsciously,  a  dia¬ 
gram  showing  movements  over  a  series  of  years  without  a  base 
line  gives  an  incorrect  impression.  (3)  The  question,  Was  the 
increment  greater  in  1886-87  or  in  1887-88  ?  can  be  more 
quickly  answered  by  observing  the  angles  than  by  noting  the 
differences.  The  line  showing  the  latter  change  is  steeper 
(makes  a  greater  angle  with  the  horizontal)  than  the  line  showing 
the  former.  Hence  the  latter  increase  is  the  greater ;  actually 
£12,600,000  against  £9,200,000.  The  most  useful  exercise 
of  this  power,  however,  is  to  judge  the  dates  at  which  the  rate 
of  increase  changed;  thus  the  value  of  exports  increased  in 
1862-63,  increased  at  a  slower  rate  in  1863-64,  and  slower 
yet  in  1864-65,  more  rapidly  in  1865-66 ;  a  slow  fall  followed 
in  1866-67,  then  an  increase  began  which  is  continually 
accelerated  to  1871,  and  so  on.  The  line  from  1872-76  is 
concave  to  the  base  line,  showing  an  accelerated  fall;  the 
concavity  from  1879  to  1882  corresponds  to  a  retarded 
rise.  The  increases  so  shown  are  absolute  or  actual,  not 
relative  or  in  ratio  to  the  quantities  at  the  beginning  of  each 
period. 

It  is  difficult  to  lay  down  rules  for  the  proper  choice  of  the 

k  2* 
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Choice  of 
scale. 


scales  by  which  the  figure  should  be  plotted  out.  It  is  only  the 
ratio  between  the  horizontal  and  vertical  scales 
that  need  be  considered.  The  figure  must  be 
sufficiently  small  for  the  whole  of  it  to  be  visible  at  once ;  if  the 
figure  is  complicated,  relating  to  a  long  series  of  years  and 
varying  numbers,  minute  accuracy  must  be  sacrificed  to  this 
consideration.  Supposing  the  horizontal  scale  decided,  the 
vertical  scale  must  be  chosen  so  that  the  part  of  the  line  which 
shows  the  greatest  rate  of  increase  is  well  inclined  to  the  vertical, 
which  can  be  managed  by  making  the  scale  sufficiently  small ; 
and,  on  the  other  hand,  all  important  fluctuations  must  be 
clearly  visible,  for  which  the  scale  may  need  to  be  increased. 
Any  scale  which  satisfies  both  these  conditions  will  fulfil  its 
purpose.  The  page  opposite  shows  the  erroneous  impressions 
which  can  be  given  by  a  judicious  manipulation  of  the  scale 
and  by  the  omission  of  the  base  line.  The  diagrams,  which 
are  drawn  roughly,  all  represent  the  same  estimates  of  wages  in 
England  and  in  the  United  States  of  America  for  certain  years 
from  i860.  Figure  1  sets  the  lines  in  proper  relief.  In  Figure  2, 
the  base  line  is  not  drawn  in  the  zero  position 
for  the  English  scale,  and  the  American  scale  is 
reduced;  the  consequence  is  that  English  wages 
appear  to  have  fluctuated  widely,  while  American  made  steady 
progress.  In  Figures  3,  4,  and  5  the  scales  are  doctored  and  the 
base  line  adjusted,  so  that  in  3  American  wages  seem  to  have 
caught  up  English,  in  5  exactly  the  reverse  is  the  case,  while  in 
4  wages  appear  to  have  moved  with  equal  rapidity  in  both 
countries.  An  examination  of  these  figures  will  show  that  the 
eye  cannot  be  trusted  to  supply  the  right  base  line,  or  to 
estimate  the  importance  of  fluctuations  without  it ;  and,  with 
certain  exceptions  to  be  mentioned  later,*  it  is  well  to  distrust 
all  those  numerous  diagrams,  where  space  has  been  economised 
at  the  expense  of  the  base  line. 

We  can  now  pass  on  to  the  consideration  of  the  smooth¬ 
ing  of  curves,  for  which  purpose  the  question  of  the  “  alleged 
Smoothing  stationariness  of  our  exports,”  discussed  by  Sir  R. 
curves.  Giffen  in  his  paper  before  the  Royal  Statistical 
Society  in  1899,  affords  an  excellent  illustration.  The  thin 
dotted  line  on  the  diagram  opposite  shows  the  value  of  exports 


Necessity  of 
correct 
base  line. 


*  See  pp.  155  seq.  and  p.  171,  infia. 


The  Same  Figures  Represented  on.  Various  Scales  and  with  Erroneous  Base  Lines. 

In  each  figure  the  scale  for  English  wages  is  on  the  left,  English  Wages. 

that  for  American  wages  on  the  right.  . American  Wages. 

Figure  i.  $  Figure  2.  '  Figure  3. 
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Total  Declared  Real  Value  of  British  and  Irish  Produce 
Exported  from  the  United  Kingdom,  i  =  £1,000,000. 


Averages 

Averages. 

Three 

Yearly. 

Five 

Yearly. 

Ten 

Yearly. 

Three 

Yearly. 

Five 

Yearly. 

Ten 

Yearly. 

185s 

95-7 

•  •  • 

•  •• 

•  •  • 

1881 

234.0 

216.2 

208.2 

221.6 

1856 

115.8 

•  •  • 

•  •  • 

•  •  • 

1882 

241.5 

232.9 

216.7 

220.I 

1857 

122.0 

III. 2 

•  •  • 

•  •  • 

1883 

239.8 

238.4 

226.0 

218.6 

1858 

116.6 

Il8.I 

•  •  • 

•  •  • 

1884 

233- 0 

238.1 

234.3 

217.9 

1859 

130.4 

I23.O 

1 16. 1 

•  •  • 

1885 

213-1 

228.6 

232.3 

216.9 

i860 

i35- 9 

127.6 

124. 1 

•  •  • 

1886 

212.7 

219.6 

228.0 

218.1 

1861 

125. 1 

I30.5 

126.0 

•  •  • 

1887 

221.9 

215.6 

224.1 

220.4 

1862 

124.0 

128.3 

126.4 

•  •  • 

1888 

234.5 

223.0 

223.0 

224.5 

1863 

146.5 

I3I.9 

132.4 

•  •  • 

1889 

248.9 

235- 1 

226.2 

230.2 

1864 

160.4 

143.7 

138.4 

127.2 

1890 

263.5 

249.0 

236.3 

234.2 

1865 

165.8 

157.6 

144.4 

134-3 

1891 

247.2 

253.2 

243.2 

235*5 

1866 

188.9 

I7I.7 

157.2 

141.6 

1892 

227.1 

245-9 

244.2 

234-1 

1867 

181.0 

178.6 

168.7 

147-5 

1893 

218.1 

230.8 

240.9 

231.9 

1868 

179.7 

183.2 

175. 1 

153-8 

1894 

215.8 

220.3 

234-3 

230.2 

1869 

190.0 

183.6 

181.0 

159.8 

1895 

225.9 

219.9 

226.8 

231.4 

1870 

199.6 

189.8 

187.8 

!65-9 

1896 

240.1 

227.3 

225.4 

234-1 

1871 

223.1 

204.2 

194.6 

175.7 

1897 

234-3 

233-4 

226.8 

235-4 

1872 

256.3 

226.3 

209.7 

188.9 

1898 

233-4 

235-9 

229.8 

235-3 

1873 

255-2 

244.9 

224.8 

200.0 

1899 

255-3 

241.0 

237.8 

236.1 

1874 

239.6 

250.4 

234.7 

207.9 

1900 

283.6* 

257.4 

249-3 

238.1 

1875 

223-5 

239-4 

239.6 

213.7 

1901 

270.9* 

269.9 

255-5 

240.5 

1876 

200.6 

221.0 

235- 1 

214.9 

1902 

277.7* 

277.4 

264.2 

245-5 

1877 

198.9 

207.7 

223.7 

216.7 

1903 

286.5* 

278.4 

274.8 

2-2.3 

1878 

192.8 

197.4 

210.9 

218.0 

1904 

296. 3* 

286.8 

283.0 

260.4 

1879 

!9i.5 

194.4 

201.4 

218.1 

I9°5 

324-4* 

302.4 

291.2 

270.2 

1880 

223.1 

202.5 

201.3 

220.5 

1906 

367.0* 

329.2 

310.4 

282  9 

•Not  including  the  value  of  ships  exported. 


year  by  year,  and  the  first  impression  given  by  it  is  that  exports 
have  not  grown  in  value  in  recent  years.  Sir  Robert  Giffen 
gave  the  following  table  : — 


Average  Annual 

i855-57  - 

1865-67 

18 75-77  - 

1885-87 

i895-97  - 


Value  of  Exports. 

-  £134.000,000 

228,000,000 
264,000,000 
274,000,000 
292,000,000 


and  from  this  he  deduced  “  that  all  through  there  is  an  increase, 
and  that  the  only  sign  of  stationariness  is  an  increase  at  a  less 
rate  in  the  last  periods  than  in  the  earlier  periods.’ * 

The  Saturday  Review  *  wrote  “  that  such  a  conclusion  is 
grossly  misleading,”  for  the  figures  are  merely  triennial  averages 
of  selected  years  showing  a  happy  coincidence ;  “  why  was  not 
1898  included  ?  ”  An  inspection  of  the  numbers  does  not  show 
11s  the  answer  to  this  criticism,  but  on  the  diagram  the  whole 

*  January  1899,  pp.  66,  67. 


TOTAL  VALUE  OF  BRITISH  AND  IRISH  PRODUCE  EXPORTED  FROM  THE  UNITED  KINGDOM  1855-1906. 
£  Example  of  the  Method  oj  smoothing  Curves, 
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circumstances  are  visible  at  a  glance.  Since  1865  three  great 
waves  have  been  completed.  The  maximum  of  1872,  due  to  the 
inflated  prices  of  that  year,  is  very  high,  but  that  of  1890  is 
greater  than  any  previous  figure,  while  the  maximum  in  1882  is 
comparatively  low.  The  minima  increase  throughout ;  those 
of  1868,  1879,  1886  show  a  regular  progression,  which  falls  off 
greatly  in  1891.  In  1894-96  it  looked  as  if  another  decennial 
cycle  was  in  progress,  but  this  was  checked  in  1897.  Since 
the  discussion,  the  returns  for  the  successive  years  to  1906 
have  shown  an  increase,  surpassing  that  which  preceded  1872. 

The  Saturday  Review  went  on  to  ask  why  Sir  Robert  Giffen 
did  not  give  “  proper  quinquennial  averages,”  such  as — 


Average  Annual  Value  of  Exports. 


1870-74 

1880-84 

1890-94 

1898 


£235,000,000 

234,000,000 

234,000,000 

233,000,000 


and  it  must  be  granted  that  this  gives  an  appearance  diametric¬ 
ally  opposite  to  that  of  the  previous  table. 

It  is  clear  that  we  need  some  general  method  of  bringing 
these  figures  into  a  form  which  shall  be  quite  independent  of  the 
choice  of  any  special  years.  The  diagram  facing  page  134  does 
this.  The  thin  continuous  line,  lying  almost  over  the  dotted 
line  of  annual  values,  shows  triennial  averages  taken  yearly, 
that  is  the  average  of  each  year  with  those  before  and  after  it ; 
this  line  smooths  off  the  corners  without  affecting  the  general 
appearance.  The  line  of  crosses  shows  quinquennial  averages, 
each  year  being  averaged  with  the  two  previous  and  two 
subsequent  years.  The  line  of  circles  shows  decennial  averages ; 
each  circle  is  placed  at  the  centre  of  the  period  whose  average 
it  represents ;  thus  the  circle  showing  the  average  of  the  ten 
years  1875-84  is  placed  vertically  over  the  line  separating  the 
years  1879  and  1880.* 

On  looking  at  the  line  of  quinquennial  averages  it  is  clear 
that  the  Saturday  Review  did  precisely  what  it  accused  Sir 
Robert  Giffen  of  doing,  for  years  are  taken  which  choice  of 
favour  the  argument.  The  quinquennial  periods  periods, 
selected  for  comparison  with  1898  are  all  on  the  upper  parts 

*  In  all  the  curves  of  averages  the  mark  showing  the  average  is  placed  at 
the  centre  of  gravity  of  the  marks  showing  the  3,  5,  or  10  quantities  averaged. 
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of  the  waves,  the  marks  showing  these  averages  are  very  near 
the  maxima  of  the  quinquennial  line,  while  the  year  1898  does 
not  appear  to  be  a  maximum.  We  might  with  just  as  much  or 
as  little  accuracy  give  the  following  : — 

Quinquennial  Averages  of  the  Values  of  Exports. 

1865-69  -----  £181,000,000 

3:875-79  -----  201,000,000 

1885-89  -----  226,000,000 

1898  -----  233,000,000 

and  say  that  the  value  in  1898  was  higher  than  any  of  the  pre¬ 
vious  selected  averages.  There  is  no  need  to  use  arbitrary  dates 
to  get  at  the  facts.  No  argument  can  stand  which  does  not  take 
account  of  the  cycle  of  trade,  which  is  not  eliminated  till  we 
take  decennial  averages.  Special  marks  in  the  diagram  show 
the  averages  for  decennial  periods,  indicating  a  rapid  increase 
before  1870,  followed  by  steady  slow  progress  till  the  subsequent 
expansion.  The  complete  line  gives  just  the  same  general 
appearance.  If,  finally,  the  figures  were  completely  smoothed 
by  a  freehand  line  keeping  as  close  to  this  as  was  possible, 
without  making  sudden  changes  of  curvature,  the  same  appear¬ 
ance  would  be  given ;  the  thick  line  on  the  diagram  is  an 
attempt  to  do  this.  The  smoothing  is  obtained  by  the  assump¬ 
tion  that  the  cycle  of  trade  is  ten  years ;  when  two  maxima  fall 
within  the  same  ten  years  the  average  of  this  period  by  our 
construction  gives  the  appearance  of  a  maximum  (e.g.,  in  1887) 
at  a  date  of  a  minimum.  This  would  be  avoided  if  we  con¬ 
tinually  changed  our  period  for  averaging  to  accommodate 
the  changing  wave-length,  a  somewhat  arbitrary  proceeding 
The  difficulty  thus  arising  can  be  easily  corrected  by  the  eye 
and  the  final  smoothed  line  is  intended  to  convey  this  corrected 
impression. 

It  should  be  clear  now  that  it  was  in  1899  five  years  too 
soon  to  pay  attention  to  the  particular  figure  for  1898;  the 
figures  for  the  next  five  years,  necessary  to  determine  the  char¬ 
acter  of  the  coming  wave,  could  not  be  foretold.  When  these 
are  included  it  is  seen  that  each  decennial  average  (for  1890-99, 
1891-1900,  etc.)  established  a  new  record,  and  that  the  figures 
for  each  year  from  1900  to  1906  are  greater  than  those  of  any 
previous  maximum.  It  will  be  seen,  moreover,  that  the  sentence 
quoted  from  Sir  Robert  Giffen  on  p.  134  is  fully  justified. 
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The  smoothed  line  now  constructed  represents  the  general 
tendency  of  the  value  of  exports,  when  accidental  and  tem¬ 
porary  variations  are  removed.  If  it  were  possible  Meaning  of 
to  separate  entirely  variations  of  short  period  from  smooth  ,ine- 

J  ±  4i  Trend  M 

secular  changes,  to  separate  the  ebb  and  flow  of  the 
tide  of  commerce  from  the  steady  current  of  increasing  trade, 
we  may  suppose  that  we  should  obtain  a  result  represented  by 
this  line.  In  it  there  are  no  sudden  changes  even  in  rates  of 
growth,  while  the  addition  and  subtraction  year  by  year  of 
relatively  small  quantities  would  produce  precisely  that 
irregular  fluctuating  line  from  which  the  smooth  line  was 
obtained. 

The  diagram  can  be  continued  from  the  following  numbers  : — 


1907 

- 

- 

416*0* 

369*1 

338-0 

301*1 

1908 

- 

- 

366*5* 

383-2 

354-o 

3I4H 

1909 

- 

- 

372-3* 

384*9 

369-2 

326*1 

1910 

- 

421*6* 

386*8 

388*7 

339-9 

1911 

- 

448‘5* 

4I4‘I 

4°5-o 

356-7 

1912 

- 

- 

480*2* 

45o-i 

4I7*8 

377*9 

1913 

- 

- 

5I4‘2* 

481*0 

427*4 

4°°-7 

*  Not  including  the  value  of  ships  exported. 


The  records  during  the  war  are  not  comparable  with  those  here 
given.  The  reader  is  recommended  to  study  the  diagram  as  printed 
and  to  judge  how  far  forecasts  of  amount,  fluctuation  and  general 
movement  are  possible,  before  looking  at  the  actual  records  of 
igoj-13. 

The  direction  of  the  smooth  line  at  any  date  may  be  called 
the  trend  of  the  series  at  that  date.  When  the  smooth  line  is 
approximately  straight  over  several  years,  its  general  direction 
shows  the  trend  in  that  period. 

A  special  method  of  determining  the  trend  has  been  recently 
used  by  Professor  Moore  (Statistical  Journal,  1919,  p.  375).  He 
assumes  that  the  general  movement  over  a  stretch  of  years  can 
be  represented  by  the  equation  y=a-\-bx-\-cx2-\-dx*,  and  determines 
the  values  of  a,  b,  c  and  d  by  the  condition  that,  if  yt  is  the  observed 
value  at  a  date  xt,  then  S (yt  —  a—  bxt  —  cxd  —  dxt2)2  should  be  a 
minimum.  Professor  Persons  (Review  of  Economic  Statistics,  Har¬ 
vard,  Preliminary  Volume,  No.  1)  assumes  that  a  straight  line  is 
sufficiently  accurate  and  minimises  S (yt  —  a  —  bxi)2.  It  is  doubtful 
whether  either  of  these  methods  is  of  general  application,  and 
Persons’  hypothesis  in  particular  must  be  used  with  discretion. 
The  method  of  moving  averages  (used  in  the  test  above)  is  cer¬ 
tainly  more  sensitive  for  showing  changes  in  the  direction  of  the 
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trend  if  a  long  series  of  years  is  under  consideration,  and  the 
general  causes  which  determine  the  phenomena  have  definitely 
varied  several  times. 


The  fuller  discussion  of  “  smoothing  ”  series  of  figures  be¬ 
longs  to  the  chapter  on  interpolation,  but  one  other  group  may 
smoothing  a  here  considered,  as  showing  the  use  of  the 

homogeneous  graphic  method  for  obtaining  regularity  out  of 
irregular  raw  material.  Referring  back  to  the 
figures  given  on  p.  69,  we  can  exhibit  the  wages  of  5000 
workers  anew  by  a  diagram,  in  which  the  ordinates  represent 
the  numbers  earning  at  or  above  a  certain  wage.  The  thin 
angular  line  on  the  adjacent  page  represents  these  numbers, 
entered  for  every  10-cent  group.  This  plan  is  especially  useful 
for  irregular  figures,  like  this  wage-group,  for  the  line  must 
always  tend  upwards  from  the  numbers  earning  the  highest 
wage  to  the  numbers  earning  at  least  the  lowest.  The  diagram 
is  also  at  once  adaptable  to  the  graphic  method  of  finding  the 
median  described  on  p.  106. 

The  irregularities  shown  by  the  thin  line  do  not  arise  from 
any  law  of  wage-grouping,  but  are  due  to  the  accidents  of  obser¬ 
vation;  if  we  regard  these  returns  as  samples  out  of  a  much 
larger  unregistered  group,  we  may  suppose  that  a  smoothed 
curve  will  indicate  approximately  the  form  which  would  be 
obtained,  if  our  returns  were  complete.  To  smooth  this  figure, 
draw  a  freehand  line  passing  as  near  the  points  as  possible 
without  abrupt  changes  of  curvature,  as  in  the  annexed  diagram. 
A  new  approximation  may  be  made  for  the  median,  quartiles, 

Graphic  method  e^c->  by  drawing  horizontal  lines  through  the  points 
of  finding  the  on  the  vertical  scale  corresponding  to  half,  one- 
quarter,  three-quarters,  etc.,  of  the  workers  ;  from 
the  points  where  these  cross  the  smooth  line,  draw  vertical  lines 
to  the  scale  of  dollars ;  the  points  on  the  scale  so  obtained  are 
the  median  (quartile,  etc.)  wage. 


The  results  obtained  are  : — 

Given  on  p.  70 
By  method  of  p.  106,  used 
in  annexed  diagram 
From  smooth  curve  in  an¬ 
nexed  diagram  - 
By  method  of  interpolation, 
p.  227  - 


Median. 

$1.49 

Quartile.  Quartile. 

$1.49 

$1.16  $2.12 

$i-5i 

$1.15  $2.13 

|i-536 

•  •  •  •  •  • 

Numbers  earning  at  or  above  the  wage  given  below 


GRAPHIC  METHODS  OF  DETERMINING  THE  MEDIAN  AND 

MODES. 


To  f<u  e  page  j38. 


THE  GRAPHIC  METHOD 


139 


This  method  is  not,  however,  one  of  great  precision ;  a  very  slight 
change  in  the  curvature  of  the  smoothed  line  would  make  more 
difference  than  those  shown  between  the  second  and  third  lines 
in  the  above  table. 

This  method  is  useful  for  determining  the  mode  approxi¬ 
mately.  It  will  be  remembered  that  the  difficulties  in  doing 
this  before  arose  from  the  uneven  distribution  on  _  ,.  ... 

Graphic  method 

the  two  sides  of  the  mode,  and  in  the  displacement  of  finding  the 
of  the  mode  by  the  adoption  of  a  second  system  of 
tabulation.  The  first  of  these  difficulties  entirely  disappears 
in  the  graphic  method,  while  the  second  is  diminished,  for 
the  displacement  now  only  depends  on  the  slight  possible 
variations  in  the  curvature  of  the  smooth  line.  The  mode  is 
clearly  the  position  where  the  greatest  number  is  added,  in  the 
present  method  of  representing  the  figures  :  that  is,  the  mode  is 
where  the  line,  angular  or  smooth,  is  steepest.  On  the  smooth 
curve  the  maximum  steepness  is  where  the  tangent  crosses  the 
curve, — in  mathematical  language,  at  a  point  of  inflexion.  This 
can  be  determined  mechanically  by  placing  a  ruler  to  touch  the 
curve,  and  turning  it  round  the  curve  till  it  crosses  it.  On  the 
annexed  figure  this  occurs  in  the  interval  between  $1.10  to  $1.40. 
A  more  complex  method  of  determining  both  mode  and  median, 
is  discussed  in  Chap.  X,  pp.  227-8. 

This  graphic  way  of  finding  these  means  has  two  great 
advantages.  It  can  be  applied  to  numbers  which  are  given 
at  irregular  intervals  of  graduation  (e.g.,  30  at  305.  6 d.,  40  at 
30s.  35  at  40s.  id.,  etc.)  as  easily  and  by  exactly  the  same 

construction  as  to  more  regular  returns;  and  if  the  smooth 
curve  is  carefully  drawn,  the  number  of  modes  can  be  seen  at  a 
glance  and  the  individual  importance  of  each  can  be  estimated. 
In  the  annexed  diagram,  the  curve  is  concave  to  the  base  line 
from  $30  to  about  $i;.2o,  convex  from  about  $1.20  to  $3.15, 
concave  till  $3.40,  and  then  convex  till  the  end.  The  points  of 
inflexion  or  the  modes  are  where  concavity  gives  way  to  con¬ 
vexity.  Hence  there  are  two  modes,  of  which  that  near  $3.4 
is  of  the  less  importance. 

A  large  class  of  diagrams  may  be  passed  by  with  a  few 
words.  Writers  and  lecturers  frequently  use  points,  lines, 
triangles,,  squares,  circles,  even  pictures,  of  dif-  pictorial 

ferent  sizes  to  assist  the  presentation  of  the  diagrams, 

relative  magnitude  of  numbers.  These  have  their  use  for 
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popular  lectures  and  hand-books,  but  do  not  add  anything  to 
the  significance  of  the  figures.  Collections  of  these  may  be 
found  in  the  second  volume  of  Gabaglio’s  Teoria  Generate  della 
Statistica,  and  in  M.  Levasseur’s  La  Statistique  Graphique  in  the 
Jubilee  Volume  of  the  Royal  Statistical  Society. 

Of  these  one  group  may  be  signalled  as  of  practical  use. 
Rectangles  may  be  used  to  express  three  quantities  :  one  side 
to  represent  price;  the  adjacent  side,  quantity;  and  the  area, 
value  :  or  number  of  houses,  average  number  of  inmates  and 
population  :  or  number  of  hours’  work  per  week,  average  output 
or  hourly  wage,  and  total  output  or  weekly  wage.  The  figures 
on  the  annexed  page  show  the  limit  to  which  this  method  can 
be  usefully  pushed. 


Representation  of  Three  Facts  by  Rectangles. 


Imaginary  budgets  of  an  artisan  and  a  labourer,  showing  amounts 
spent  weekly  on  various  commodities,  and  number  of  hours’  work 
necessary  for  each  amount. 


Ter  week, 
£1.  13s.  4d. 


ALL  ELSE 


Clothes  3/l 


8J.  per  hour. 


Per  week, 

£1. 


4cl.  per  hour. 


The  horizontal  scale 
represents  pence  per  hour. 
.125  inch  =  id. 

The  vertical  scale  re¬ 
presents  number  of  hours 
per  week.  .1  inch  =  2  hours. 

The  areas  represent 
amounts  spent,  and  the 
whole  rectangles  show  the 
week’s  wages  on  the  same 
scale.  1  sq.  in.  =  13s.  4d. 


A  Joint  Committee  on  Standards  for  Graphic  Representa¬ 
tion  has  since  1916  worked  at  the  best  methods  for  presenting 
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statistics  graphically,  and  has  made  many  useful  suggestions 
which  may  produce  uniformity  in  treatment  and  avoid 
errors. 

The  use  of  statistical  maps  can  only  be  afforded  a  brief 
notice  here.  Any  numerical  quality  of  a  population,  its 
density,  average  income,  average  taxation,  may  Carto  ramg 
be  shown  district  by  district  by  suitable  markings, 
or  colours.  Of  these  the  most  useful  method  is  to  choose  one 
colour,  say  blue,  for  excess  above  the  average;  another,  say 
red,  for  defect.  Divide  the  districts  in  nine  groups,  say  more 
than  7  per  cent.,  5  to  7  per  cent.,  3  to  5  per  cent.,  1  to  3  per 
cent,  above  the  average  :  these  should  be  marked  by  four  shades 
of  blue,  becoming  lighter  as  the  average  is  approached ;  within 
1  per  cent,  of  the  average,  above  or  below,  should  be  white; 
and  shades  of  red,  gradually  becoming  darker,  will  show  the 
remaining  grades  below  the  average.  Care  must  be  taken 
not  to  adopt  too  many  grades.  For  examples  of  this  method 
see  Booth’s  Life  and  Labour  of  the  People ,  maps ;  the  Statistical 
Atlas  of  the  Xlth  Census  of  the  United  States;  the  Statistical 
Atlas  of  India  ;  and  the  maps  in  M.  Levasseur’s  paper  just 
mentioned.  A  cheap  and  very  effective  method,  by  which 
similar  results  are  obtained  in  black  and  white  only,  may  be 
seen  on  Plate  P  (misprinted  2)  in  that  paper,  and  in  the  excellent 
chapter  on  Graphic  Representation  in  Bertillon’s  Cours 
elementaire  de  Statistique,  p.  133  seq. 

A  common  defect  in  maps  of  this  class  arises  from  the  fact 
that  records  generally  relate  to  administrative  areas,  while  the 
phenomena  to  be  represented  are  independent  of  these.  An 
example  will  make  this  difficulty  evident.  If  a  map  is  made  of 
England  in  1911  colouring  the  counties  according  to  the 
density  of  population,  Cumberland  will  be  marked  by  the 
colour  appropriate  to  27  persons  per  100  acres,  and  Northumber¬ 
land  by  that  for  53  persons.  The  colour  will  change  abruptly 
in  a  moorland  region  where  for  many  miles  the  population  is  of 
a  uniform  sparseness.  This  difficulty  can  be  overcome  by 
either  of  two  methods.  Minute  divisions,  e.g.,  civil  parishes, 
can  be  taken  as  the  units,  and  each  shaded  in  black  only, 
the  amount  of  pigment  increasing  with  the  population ;  or  the 
population  can  be  marked  in  situ  as  accurately  as  the  data 
allow,  a  dot  of  uniform  size  being  placed  for  each  100  people, 
with  a  modification  of  method  for  dense  districts.  A  map  of 
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this  kind  is  reproduced  in  Professor  Secrist’s  Statistical  Methods , 
1917,  p.  189. 


2.  Historical  Diagrams. 


Perhaps  the  chief  use  of  diagrams  is  to  afford  a  rapid  view 
of  the  relations  between  two  series  of  events. 

The  different  cases  that  occur  are  best  illustrated  by 
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examples.  The  simplest  is  when  we  wish  to  compare  two  sets 
of  figures  expressed  in  the  same  unit,  say  £  sterling ;  comparison  of 
and  the  simplest  of  these  when  we  wish  simply  to  figures 
compare  a  whole  and  its  parts. 

O11  the  adjacent  diagram  the  upper  line  shows  the  annual 
total  gross  revenue  of  the  United  Kingdom  ( Statistical  Abstract , 
1906*);  the  next  line,  that  part  which  comes  illustrated  by 
from  inland  revenue  and  customs,  the  difference  the  revenue, 
being  mainly  composed  of  post  office  receipts.  The  principal 
heads  of  revenue  are  customs,  excise,  income  tax,  and  post 
office.  These  are  shown  by  suitable  lines  for  each  year,  each 
line  being  independent  of  the  other,  and  all  having  the  same 
base  line  and  being  on  the  same  scale.  This  method  is  greatly 
preferable  to  the  alternative  one  of  drawing  a  second  line 
representing  the  total  less  customs,  a  third  the  total  less  customs 
and  excise,  and  so  on,  because  the  eye  is  then  quite  incapable 
of  judging  the  relative  movements  of  the  separate  items. 
The  figure  shows  at  once  the  main  features  of  the  course  of 
revenue.  The  increase  has  been  rapid  but  irregular.  The 
rapid  growth  in  1854-57  was  not  at  once  maintained,  but  the 
figures  for  the  6o’s  are  at  a  far  higher  level  than  those  for  the 
5o’s.  A  rapid  fluctuation  in  1870  is  followed  by  a  more  regular 
growth  almost  unchecked  till  1887;  and  then,  after  a  short 
stationary  period,  there  are  great  increases  in  1895,  and  between 
1898  and  1903.  Nearly  the  same  remarks  apply  to  the  line 
showing  inland  revenue  and  customs.  If  we  look  for  the  parts 
of  the  revenue  that  have  borne  the  increase  and  change,  we  see 
that  prior  to  1900  receipts  from  excise  had  increased  most, 
next  those  from  the  post  office,  and  next  those  from  the  income 
tax,  while  the  customs  had  diminished.  Each  line  has  its 
distinctive  features.  The  post  office  payments  show  an  almost 
regular  growth.  The  income  tax  fluctuates  violently,  bearing 
the  brunt  of  nearly  all  the  rapid  changes  in  the  total,  especially 
in  1856  and  1870,  and  1900-02.  The  excise  line  shows  a 
moderate  increase  till  1870,  a  sudden  jump  to  1874,  and  a  very 
slow  growth  since  that  date.  Customs,  on  the  other  hand, 
have  to  some  extent  taken  an  opposite  course  to  that  of  excise, 
so  that  the  total  from  the  two  had  not  changed  very  rapidly 
prior  to  1900.  At  the  top  of  the  page  a  new  base  line  is  taken, 

*  This  diagram  cannot  be  carried  later  owing  to  a  change  in  the  book 
keeping  of  Imperial  and  Local  taxation  accounts. 
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Revenue  of  the  United  Kingdom' 


Unit,  in  all  columns,  £10,000. 


Year  ended 
31st  March 

Total 

Revenue. 

Inland 

Revenue 

and 

Customs. 

Customs. 

Excise. 

Property 
and  Income 
Tax. 

Post  and 
Telegraph. 

1850 

5.739 

5.431 

2,226 

1,497 

560* 

216 

1851 

5,732 

5.412 

2,204 

1,528 

560* 

228 

1852 

5.658 

5-335 

2,222 

x,538 

550* 

237 

1853 

5-753 

5.401 

2,214 

i,575 

570* 

23  7 

1854 

5,890 

5,502 

2,251 

1,630 

580* 

252 

1855 

6,282 

5,944 

2,163 

1,680* 

1,070* 

237 

1856 

7,026 

6,601 

2,324 

I,73°# 

1,520* 

281 

1857 

7,279 

6, 848 

2,353 

1,840* 

1,620* 

292 

1858 

6,788 

6,309 

2,3“ 

1,782 

*»  *59 

292 

1859 

6,548 

5,987 

2,412 

1,790 

668 

320 

i860 

7,109 

6,570 

2,446 

2,036 

960 

331 

1861 

7,028 

6,5*4 

2,33i 

x,943 

1,092 

340 

1862 

6,986 

6,412 

2,367 

x,833 

1,036 

351 

1863 

7,060 

6,390 

2,403 

*,7*5 

1,057 

365 

1864 

7,021 

6,306 

2,323 

1,821 

908 

381 

1865 

7,031 

6,291 

2,257 

1,956 

796 

410 

1866 

6,781 

6,036 

2,128 

x,979 

639 

425 

1867 

6,943 

6,156 

2,230 

2,067 

570 

447 

1868 

6,960 

6,204 

2,265 

2,016 

618 

463 

1869 

7,259 

6,422 

2,242 

2,046 

862 

466 

1870 

7,543 

6,708 

2,i53 

2,176 

1,004 

477 

1871 

6,994 

6,106 

2,019 

2,279 

635 

527 

1872 

7,47i 

6,484 

2,033 

2,333 

908 

543 

1873 

7,66i 

6,660 

2,103 

2,578 

750 

583 

1874 

7,734 

6,608 

2,034 

2,717 

569 

700 

187S 

7,492 

6,397 

1,929 

2,739 

43 1 

6  79 

1876 

7,7i3 

6,525 

2,002 

2,763 

411 

719 

18  77 

7,857 

6,636 

1,992 

2,774 

528 

73° 

1878 

7,774 

6,610 

x,997 

2,746 

582 

746 

1879 

8,115 

6,899 

2,032 

2,740 

871 

757 

1880 

7,934 

6,695 

x,933 

2,530 

923 

777 

1881 

8,187 

6,895 

1,918 

2,530 

1,065 

830 

1882 

8,396 

7,058 

1,929 

2,724 

994 

863 

1883 

8,739 

7,3X3 

1,966 

2,693 

1,190 

901 

1884 

8,616 

7,187 

1,970 

2,695 

1,072 

947 

1885 

8,799 

7,38o 

2,032 

2,660 

1,200 

966 

1886 

8,958 

7,493 

x,983 

2,546 

1,516 

989 

1887 

9.077 

7,611 

2,015 

2,525 

1.590 

1,028 

1888 

8,980 

7.566 

1,963 

2,562 

i,444 

1,060 

1889 

8,847 

7,360 

2,007 

2,560 

1,270 

1,118 

1890 

8,930 

7,34i 

2,042 

2,416 

1,277 

1,177 

1891 

8,949 

7.358 

1,948 

2,479 

1,325 

1,226 

1892 

9.099 

7,534 

1,974 

2,561 

1,381 

1,263 

1893 

9,040 

7,480 

1. 97i 

2,536 

i,347 

1,288 

l894 

9,H3 

7-543 

i,97i 

2,520 

i,52° 

1,301 

i895 

9,468 

7.865 

2,011 

2,605 

1,560 

1.334 

1896 

10,197 

8,512 

2,076 

2,680 

1,610 

1,422 

1897 

IO,395 

8,597 

2,125 

2,746 

1,665 

1.477 

1898 

10,661 

8,855 

2,180 

2,830 

1,725 

i,5x8 

1899 

10,834 

8,945 

2,085 

2,920 

1,800 

1,586 

1900 

11,984 

9.963 

2,380 

3,210 

1,875 

1,665 

1901 

13.038 

10,956 

2,626 

3.3io 

2,692 

1,725 

1902 

14,300 

12,189 

3,099 

3,160 

3,48o 

1,779 

1903 

I5.i55 

12,993 

3,443 

3,210 

3,880 

1,838 

1904 

i4,i55 

11-935 

3.385 

3.155 

3.080 

x.9i5 

i9°S 

14.337 

12,053 

3,573 

3,075 

3,125 

1.993 

1906 

14,398 

11,987 

3-447 

3,023 

3.135 

2,101 

*  These  figures  cannot  be  given  accurately  within  Xioo.ooo. 
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and  the  number  of  pounds  per  head  of  the  population  is  shown 
year  by  year;  it  will  be  seen  that  the  only  important  increases 
were  between  1853  and  1857,  and  from  1898  to  1903. 

So  far  we  have  found  no  more  difficulty  in  the  choice  of 
scales  than  previously  when  dealing  with  only  one  line,  for  all 
the  lines  on  the  larger  diagram  indicate  millions  choice  of 
of  pounds,  and  when  the  unit  is  £1,  a  new  base  second  scale- 
line  has  been  adopted.  But  we  may  wish  to  show  the  change 
of  population  on  the  larger  diagram.  It  is  necessary,  as 
we  have  already  seen,  to  use  the  same  base  line  for  the  two 
quantities  to  be  compared ;  but  we  may  choose  any  point  for 
the  beginning  of  the  new  line,  adapting  our  vertical  scale,  for  the 
eye  can  judge  the  proportionate  changes  wherever  the  line  is 
placed.  It  is  best  to  decide  this  point  by  defining  the  problem 
on  which  the  comparison  should  throw  light.  If  it  is  required  to 
compare  the  growth  of  revenue  with  the  growth  of  population 
since,  say,  1850,  we  should  start  the  new  line  at  the  point  on 
the  1850  line  where  the  revenue  curve  begins,  and  we  can  then 
see  how  the  lines  intersect  one  another  again  and  again.  Since 
1850,  however,  is  an  arbitrary  date,  this  plan  lacks  definition, 
and  it  is  more  logical  to  make  the  lines  coincide  at  the  most 
recent  date  given,  with  which  any  previous  date  can  then  be 
compared.  On  the  diagram  the  line  is  drawn  on  such  a  scale 
that  it  lies  fairly  close  to  that  for  inland  revenue  throughout 
the  greater  part  of  its  course. 

The  next  diagram,  .facing  p.  146,  introduces  further  diffi¬ 
culties  as  to  the  choice  of  scales.  The  object  of  the  figure  is  to 
show  the  relations  between  quantity,  value,  and  Comparisono{ 
price  of  imported  wheat,  and  population.  The  line  quantity  and 
A  is  first  drawn  on  a  scale  chosen  so  as  to  throw  its 
fluctuations  into  relief.  Population  is  at  once  brought  into 
relation  with  this  by  calculating  the  amount  per  head  year  by 
year.  The  line  C  to  represent  these  figures  is  drawn  on  a 
different  scale,  chosen  so  that  the  line  shall  not  cause  confusion 
by  continually  crossing  any  of  the  others  on  the  figure.  If  the 
figure  was  too  full  this  could  be  treated  as  on  p.  142,  the  revenue 
per  head.  The  same  scale  of  years  must  be  used,  and  for 
simplicity  of  calculation  and  appearance,  100  lbs.  consumed  per 
head  is  measured  by  the  same  vertical  distance  as 
10,000,000  cwt.  imported.  A  and  C  refer  to  the 
same  quantities,  and  therefore  similar  lines  are  used  in  both 
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cases.  The  line  B  represents  value  and  is  shown  by  a  broken 
line.  For  this  line  the  choice  of  scale  is  more  difficult.  In  the 
diagrams  which  follow,  instances  will  be  shown  where  special 
methods  are  used  to  bring  out  specific  comparisons.  Here  this 
is  not  necessary,  and  a  scale  is  adopted  which  brings  the  lines 
A  and  B  into  near  relation,  and  shows  the  fluctuations  of  B, 
while  the  figure  is  made  simple  and  intelligible  by  the  repre¬ 
sentation  of  £ 20  by  the  same  vertical  distance  as  20  cwt. 

The  line  D  shows  the  changing  price  of  wheat  as  deduced 
from  columns  A  and  B.  The  scale  is  chosen  so  that  it  boldly 
crosses  the  lines  A  and  B;  thus  its  fluctuations  are  clearly 
shown,  and  the  numbers  are  easily  seen,  for  2 s.  per  cwt.  is 
represented  by  the  same  vertical  line  as  10,000,000  cwt.  If  the 
figure  was  accurately  drawn,  lines  A  and  D  would  lie  one  over 
the  other  in  1876-77 ;  they  are  therefore  shifted  very  slightly 
horizontally,  and  clearness  is  preserved  without  the  general 
impression  being  vitiated. 

The  lines  in  the  diagram,  elucidated  by  the  table,  suggest 
many  characteristics  and  changes  which  call  for  explanation 
Movements  by  students  of  economic  history.  The  consump- 
needing  tion  of  imported  wheat  per  head  increased  for 
exp.an.tcn.  thirty  years  to  1895,  and  was  then  lower  for  some 
years.  The  quantity  imported  shows  violent  short-period 
fluctuations.  The  price  after  violent  fluctuations  from  1862 
to  about  1878  fell  for  seventeen  years  with  little  intermission. 
Here  no  doubt  is  shown  the  effect  of  many  causes  :  an  increasing 
population,  the  fact  that  wheat  imported  is  complementary  to 
the  home  product  which  is  dominated  by  the  English  weather, 
the  variation  of  harvests  all  over  the  world,  political  events, 
the  fall  in  the  value  of  silver,  the  development  of  communica¬ 
tion  and  transport,  etc.  The  function  of  the  diagram  is  to 
show  the  general  trends  and  the  dates  of  change,  but  of  course 
one  cannot  from  it  ascertain  the  causes. 

As  regards  the  choice  of  markings  for  different  lines,  the 
chief  rule  is  that  lines  which  cross  one  another,  unless  very 
acutely,  must  be  marked  differently.  The  second  rule  is  to 
mark  similar  quantities  in  similar  ways. 

If  it  is  possible  to  use  more  than  one  colour  this  principle 
can  be  easily  carried  out.* 

*  See  Wages  in  the  Nineteenth  Century ,  by  the  present  author,  diagram 
facing  p.  90. 
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Importations  of  Wheat  and  Wheat  Flour,  1862  to  1906. 

Wheat  flour  is  reckoned  at  its  equivalent  in  grain. 


I 

Year. 

A. 

Total  Quanti¬ 
ties  Imported. 
Unit, 

100,000  cwt. 

B. 

Total  Value 
Imported. 

Unit, 

£100,000. 

c. 

Quantity  retained 
per  Head  of  the 
Population. 

D. 

Average  Value  of 
Wheat  and  Wheat 
Flour  in  Shillings 
per  cwt. 

1862 

500 

286 

191  lbs. 

ii-44 

1863 

309 

155 

118  „ 

10.03 

1864 

288 

135 

109  „ 

9-37 

1865 

258 

I24 

97  „ 

9.61 

1866 

294 

168 

no  „ 

H-43 

1867 

391 

285 

*44  „ 

14.58 

1868 

365 

249 

134  „ 

13.64 

1869 

444 

233 

166  ,, 

10.50 

1870 

369 

196 

132  „ 

10.62 

1871 

444 

268 

158  „ 

12.07 

1872 

476 

303 

168  „ 

12-73 

1873 

516 

344 

180  „ 

13-33 

1874 

493 

309 

170  „ 

12.53 

1875 

595 

324 

203  „ 

10.89 

1876 

519 

279 

176  „ 

io-75 

1877 

635 

407 

212  „ 

12.82 

1878 

597 

342 

197  „ 

11.46 

1879 

730 

400 

239  „ 

10.95 

1880 

685 

393 

222  „ 

n-47 

1881 

713 

407 

229  „ 

11.42 

1882 

808 

449 

257  „ 

11. 11 

1883 

851 

438 

269  „ 

10.30 

1884 

669 

301 

210  „ 

9.00 

1885 

823 

337 

256  „ 

8.19 

1886 

670 

261 

207  „ 

7-79 

1887 

802 

314 

245  ,, 

7.82 

1888 

804 

315 

244  „ 

7.82 

1889 

789 

3ii 

238  „ 

7.88 

1890 

824 

327 

246  „ 

7-94 

1891 

895 

396 

265  „ 

8.85 

1892 

956 

37i 

281  „ 

7.76 

1893 

938 

308 

273  „ 

6-57 

1894 

967 

268 

2 77  „ 

5-54 

1895 

1,073 

302 

305  „ 

5-63 

1896 

996 

309 

279  „ 

6.21 

1897 

887 

330 

247  „ 

7-44 

1898 

944 

377 

259  „ 

7-99 

1899 

985 

330 

267  „ 

6.71 

1900 

986 

334 

266  „ 

6.78 

1901 

1,011 

334 

270  „ 

6.60 

1902 

1,079 

360 

288  „ 

6.6  7 

1903 

1,167 

397 

309  „ 

6.80 

1904 

1,182 

415 

3io  „ 

7.02 

1905 

1,142 

413 

296  „ 

7-23 

L  2* 
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The  following  table  contains  numbers  for  continuing  the 
diagram. 


Year. 

A. 

B. 

C. 

D. 

1906 

1,127 

395 

290  lbs. 

7.01 

1907 

1,156 

440 

295  „ 

7.61 

1908 

1,091 

454 

2 75  „ 

8.32 

1909 

1,132 

516 

284  „ 

9.12 

1910 

1,191 

497 

296  „ 

8-35 

1911 

1,120 

442 

276  „ 

7.89 

1912 

1,237 

520 

301  ,, 

8.41 

1913 

1,225 

502 

296  „ 

8.20 

The  general  characteristics  of  a  series  in  time  are  to  be  found 
in  its  trend  and  in  the  nature  of  its  fluctuations,  and  such 
series  may  be  classified  as  follows  : — 

(1 a )  With  trend,  in  constant  or  gradually  changing  direction, 

Trend  and  and  no  fluctuations.  Statistics  of  the  population 

fluctuations.  0f  a  country  are  generally  in  this  class.* 

(b)  With  random  fluctuations;  that  is,  fluctuations  of  such 
a  nature  that  when  a  movement  (up  or  down)  is  recorded  in  a 
year  it  does  not  lead  to  any  forecast  as  to  whether  the  move¬ 
ment  in  the  following  year  will  be  up  or  down.  Ex.  Annual 
statistics  of  rainfall. 

(c)  With  compensating  fluctuations;  that  is,  when  an 
upward  movement  in  one  year  is  generally  compensated  by  a 
downward  movement  in  the  following.  Birth,  death  and 
marriage  rates  frequently  show  such  compensation. 

(d)  U adulatory ;  that  is,  when  after  a  maximum  or  crisis 
downward  movements  follow  one  another  for  some  years  till  a 
minimum  is  reached  and  then  there  are  successive  upward 
movements.  General  price  statistics,  and  indeed  that  great 
mass  of  records  which  is  related  to  the  so-called  commercial 
cycles,  are  of  this  nature. 

( e )  Periodic ;  that  is,  when  every  ten  years  or  twelve 
months,  or  some  other  period,  the  sequence  of  ups  and  downs 
is  repeated  in  the  same  order  and  (in  some  cases)  the  magnitude 
of  the  fluctuations  is  repeated.  A  seasonal  example  is  given 
on  pp.  159  seq.  below. 

In  (b),  (c),  (d),  and  ( e )  a  trend  may  be  combined  with  the 


*  There  are  also  series  where  the  records  are  equal  over  several  years 
and  then  move  abruptly  to  another  level  and  there  remain  for  a  time.  Standard 
time-rates  afford  an  example  of  this  kind. 
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fluctuations.  We  may  also  have  random  or  compensated 
fluctuations  superimposed  on  an  undulatory  movement  and 
a  trend ;  ripples  on  the  waves  of  a  rising  tide.  When  we  have 
a  time  series  of  records,  it  is  very  important  to  consider  the 
general  nature  of  the  trend  and  fluctuations  shown,  in  order 
to  form  a  judgment  of  the  near  future.  If  fluctuations  are  seen 
to  be  random  and  violent,  we  shall  not  be  disturbed  by  a  low 
record  and  believe  some  remedial  measures  to  be  necessary. 
In  the  case  of  compensating  fluctuations,  we  shall  anticipate  a 
high  value  after  a  low  one.  If  the  series  is  undulatory  we  shall 
be  prepared  for  a  deferred  recovery  after  the  figures  have  once 
broken  from  a  high  value. 

3.  Comparisons  of  Series  of  Figures. 

A.  Before  proceeding  to  the  study  of  the  next  diagram,  it  will 
be  well  to  define  more  exactly  what  is  our  object  in  comparative 
studies  of  figures,  and  to  consider  the  means  at  our  disposal. 

When  dealing  with  two  series  of  similar  quantities  such  as 
the  course  of  trade  or  population  in  two  countries,  we  wish  to 
see  the  general  rate  of  progress  (as  can  be  done  by  QuEesita  in 
smoothing  the  curve),  the  years  of  special  increase,  comparisons, 
the  dates  of  maximum  and  minimum,  in  fact  to  compare  the 
three  things  that  the  eye  can  see — the  increase,  the  rate  of 
increase,  and  the  dates  of  change  of  rate  of  increase.  The  most 
obvious  way  to  do  this  is,  to  take  the  same  scale  and  base  line 
for  both  countries  and  the  same  unit  of  measurement ;  but  this 
method  does  not  take  us  all  the  way.  We  can  judge  differences, 
it  is  true,  and  the  additions  in  all  the  years  in  both  countries,  and 
we  can  see  the  highest  and  lowest  points  and  dates  of  change  of 
rate  of  increase;  but  we  cannot  compare  rates  of  increase. 
It  is  not  easy  to  judge  ratio,  though  a  rough  guess  at  it  is 
possible.  Thus  if  the  trade  is  very  different  in  magnitude  in  the 
two  countries,  equal  absolute  increments  will  mean  very  different 
relative  increments,  and  it  is  difficult  to  be  always  on  one’s  guard. 

The  remedy  for  this  is  to  alter  the  arrangement  of  scales. 
Make  a  second  figure,  in  which  the  unit  shall  be  not  a  sum  of 
money,  but  a  percentage  :  let  1  per  cent,  of  Eng-  percentage 

land’s  trade,  say  in  1850,  be  the  unit  for  the  scales* 

English  line;  and  1  per  cent,  of  the  trade  of  Germany,  at  the 
same  date,  for  the  German  line.  In  other  words,  express  the 
trade  of  both  countries  as  percentages  of  their  value  in  a  given 
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year,  and  draw  lines  to  represent  these  percentages.  Alongside 
the  diagram  two  or  more  scales  can  be  placed  showing  the 
absolute  amounts  of  the  trade  of  each  country.  Then  the  rates 
of  increase  will  be  comparable,  equal  increments  representing 
equal  percentages  of  the  trade  of  each  country  in  1850 ;  and, 
in  addition,  the  dates  at  which  either  country  gained  ground 
relatively  to  the  other  can  be  easily  picked  out.  The  question 
whether  absolute  rates  or  relative^  rates  should  be  studied 
is  a  very  common  one  in  statistics.  Sometimes  the  absolute 
Absolute  or  magnitude  should  be  known,  as  for  instance  when 
relative  we  want  to  estimate  the  effect  of  measures  which 
progress.  wjq  affect  the  well-being  of  special  classes,  or  the 
trade  of  special  countries ;  sometimes  the  relative  rate,  as  when 
we  want  to  watch  the  progressive  increase  of  different  industries, 
or  to  be  on  our  guard  as  to  future  competitors.  The  two  studies 
generally  require  two  different  diagrams  though  they  may 
represent  the  same  numbers. 

It  will  be  seen  that  the  chief  difficulty  lies  in  the  choice  of 
the  year  in  which  the  quantities  are  to  be  equated ;  this  must 
be  decided  by  the  nature  of  the  argument  which  the  diagram 
is  to  illustrate. 


We  may  compare  the  following  numbers — 


Year 

- 

1880 

1890 

1900 

A  - 

9 

220 

440 

330 

B  • 

- 

160 

240 

400 

in  three  ways,  shown  in  the  diagrams  on  p.  15 1. 

In  Figure  3  the  fluctuations  are  seen  as  percentages  of  the 
values  at  the  last  date,  and  are  thrown  into  better  proportion 
than  in  Figure  1.  It  is  frequently  the  case  that  the  equating  of 
quantities  at  the  most  recent  date  throws  what  are  often  small 
beginnings  into  their  right  proportion  when  viewed  from  the 
modern  standpoint.  The  statements  that  the  values  in  1880 
were  40  and  67  per  cent,  respectively  of  the  corresponding 
present  values,  is  in  better  perspective  than  the  statement  that 
the  values  in  1900  were  250  per  cent,  and  150  per  cent  of  the 
corresponding  values  in  1880 ;  but  circumstances  must  decide 
in  each  case  which  method  is  to  be  adopted. 
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1.  Expressed  as  percentages  of  values 
in  1880. 


2.  Expressed  as  percentages  of  values 
in  1890. 


Scales 

7,  A.  B. 

200  440  320 

150  330  240 

too  220  160 

50  no  So 


Scales 

7.  A.  B. 


150  660  360 


100  440  240 


50  220  120 


lS8o 


1900 


3.  Expressed  as  percentages 
of  values  in  1900. 


Scales 

7.  A.  B. 

150  495  600 

100  330  400 
50  165  200 


1880  1890  1900 


These  points  are  fully  illustrated  by  the  annexed  diagrams, 
the  object  of  which  is  to  analyse  the  progress  of  our  trade  with 
our  colonies  and  with  foreign  countries,  especially  Illustration 
Germany.  The  first  figure  shows  the  total  im-  from  trade  with 
ports  and  exports,  and  the  parts  of  each  which  Germany' 
are  colonial  and  foreign,  the  scale  in  millions  of  pounds  being 
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the  same  for  all  the  lines.  A  line  is  also  given  for  imports  from 
Germany,  Holland,  and  Belgium ;  these  are  grouped  together, 
because  it  was  not  possible  till  1904  to  distinguish  in  the  returns 


Imports  and  Exports,  1862-1905. 

Unit  in  all  columns,  ^100,000. 


Total 

Imports. 

Total 

Exports 

including 

Re-exports. 

Exports 

to 

British 

Possessions. 

Exports 

to 

Foreign 

Countries. 

Imports 

from 

British 

Possessions. 

Imports 

from 

Foreign 

Countries. 

Imports 

from 

Germany, 
Holland  and 
Belgium. 

1862 

•  2,257 

1,662 

454 

1,207 

653 

1,604 

279 

1863 

2,489 

1,969 

550 

1,419 

847 

1,642 

283 

1864 

2,749 

2,126 

557 

1,569 

937 

1,812 

332 

1865 

2,711 

2,  l88 

515 

1,673 

728 

1,982 

364 

1866 

2,953 

2,389 

572 

1,817 

722 

2,231 

388 

1867 

2,752 

2,258 

534 

1,724 

607 

2,144 

373 

1868 

2,947 

2,278 

537 

1,741 

670 

2,277 

379 

1869 

2,955 

2,370 

5X9 

1,851 

704 

2,250 

405 

1870 

3,033 

2,441 

554 

1,887 

648 

2,384 

409 

1871 

3,3io 

2,836 

556 

2,280 

729 

2,581 

469 

1872 

3,547 

3U46 

656 

2,490 

794 

2,753 

455 

1873 

3,7i3 

3,no 

711 

2,399 

810 

2,903 

463 

1874 

3,7oi 

2,977 

779 

2,197 

822 

2,879 

494 

1875 

3,739 

2,816 

•  767 

2,050 

844 

2,895 

5i5 

1876 

3,752 

2,568 

701 

1,866 

843 

2,908 

5i6 

1877 

3,944 

2,523 

758 

1,766 

896 

3,049 

590 

1878 

3,688 

2,455 

720 

x,735 

779 

2,908 

575 

1879 

3,630 

2,488 

665 

1,823 

789 

2,840 

543 

I8S0 

4,112 

2,864 

815 

2,049 

925 

3,187 

616 

1881 

3,970 

2,971 

867 

2,104 

915 

3,055 

582 

1882 

4U30 

3,067 

923 

2,143 

994 

3U36 

658 

1883 

4,269 

3,054 

904 

2,150 

987 

3,282 

692 

1884 

3,9oo 

2,960 

883 

2,077 

958 

2,942 

646 

1885 

3,7io 

2,7r5 

885 

1,860 

844 

2,866 

638 

1886 

3,499 

2,690 

822 

1,867 

819 

2,680 

609 

1887 

3,622 

2,813 

823 

1,990 

838 

2,784 

646 

1888 

3,876 

2,986 

917 

2,068 

869 

3,oo7 

684 

1889 

4,276 

3U56 

908 

2,248 

973 

3,304 

7i5 

1890 

4,207 

3.283 

945 

2,337 

962 

3,245 

694 

1891 

4o54 

3,o9i 

933 

2,158 

995 

3,36o 

716 

1892 

4,238 

2,916 

812 

2,104 

979 

3,259 

7i5 

1893 

4,047 

2,771 

786 

1,986 

9T9 

3,128 

720 

1894 

4,083 

2,73s 

786 

i,952 

940 

3U43 

716 

1895 

4,167 

2,858 

761 

2,098 

957 

3,210 

729 

1896 

4,4i8 

2,964 

907 

2,057 

933 

3,485 

761 

1897 

4,5io 

2,941 

871 

2,071 

941 

3,569 

760 

1898 

4,705 

2,940 

901 

2,038 

998 

3,7o8 

786 

1899 

4,850 

3,295 

943 

2,352 

1,069 

3,78i 

834 

1900 

5,231 

3,544 

1,021 

2,523 

1,096 

4U34 

861 

1901 

5,220 

3,479 

1,132 

2,347 

1,057 

4,163 

897 

1902 

5,284 

3,492 

1,176 

2,3X7 

1,069 

4,215 

95° 

1903 

5,426 

3,604 

i,i95 

2,409 

i,i37 

4,289 

973 

1904 

5,5io 

3,7io 

1,208 

2,502 

1,200 

4,3io 

962 

I9°5 

*5,650 

4,076 

1*227 

2,849 

I,279 

4,372 

990 

from  the  two  latter  home  manufactures  from  German  goods  in 
transit.  It  is  not  clear  from  this  diagram  which  part  of  our 
imports  has  increased  most  rapidly.  The  three  lines  are, 
therefore,  redrawn  in  the  second  diagram,  on  a  percentage  scale, 


TRADE  of  BRITISH  POSSESSIONS  and  FOREIGN  COUNTRIES. 


Total  Imports 


IMPORTS  as  PERCENTAGES  of  THEIR  TOTAL  VALUES  IN  1905. 


Figure 


Figure 


From  British  Possessions  « — *- 
From  Foreign  Countries  — 
From  Germany,  Holland  &  Belgium  - 


Seals  for 
Germany 
&  general 
percentages 


Scale  for 
Foreign 
Countries 


Seals  for 
British 
Possessions 


■ 


. 
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all  the  values  being  expressed  as  percentages  of  the  corre¬ 
sponding  values  in  1905.  It  is  now  seen  that  imports  from 
foreign  countries  and  from  our  colonial  possessions  and  India 
have  marched  together  except  during  the  period  of  the  cotton 
famine,  but  the  trade  from  Germany,  etc.,  has  increased  more 
rapidly  than  either.  If  we  had  equated  the  quantities  in  1862, 
the  German  line  would  have  far  outpassed  the  others  by  1905  ; 
but  the  impression  given  would  be  erroneous  as  regards  absolute 
quantities,  for  the  increase  was  only  £71,100,000  for  the  one, 
while  it  was  £277,000,000  for  all  foreign  countries.  The 
remaining  diagram  shows  the  relative  rates  of  increase  for 
Germany,  Holland  and  Belgium,  and  the  British  possessions 
respectively,  since  1870. 

The  International  Institute  of  Statistics  has  considered  the 
possibility  of  standardising  historical  diagrams  for  comparison, 
and  resolved  at  its  meeting  in  1911  that  the  average  of  the 
figures  for  the  years  1901-10  should  be  taken  as  the  standard 
and  that  this  average  should  be  represented  by  a  vertical  height 
equal  to  the  horizontal  measurement  that  represented  thirty 
years.  Diagrams  drawn  on  this  standardised  scale  can  then 
readily  be  compared  with  one  another  whatever  quantities  they 
represent.  It  is  not  intended  to  prevent  other  comparisons 
being  made  (as,  for  example,  those  on  the  diagram  facing 
p.  146),  nor  diagrams  that  represent  series  all  expressed  in  the 
same  units  (£  or  tons)  being  drawn  with  the  same  natural  unit. 
The  intention  is  that  the  standard  should  be  adopted  as  the 
only  form  where  there  is  no  reason  to  the  contrary,  and  as  an 
alternative  form  in  other  cases.  Comparison,  especially  of 
international  statistics,  will  be  greatly  facilitated  if  these  rules 
are  followed. 

B.  Series  of  figures  are  often  compared  graphically  with  a 
view  to  discovering  or  illustrating  causal  relations.  In  such 
cases  we  do  not  study  relative  growth  only  as  in  causal 
the  last  diagram  discussed,  but  look  throughout  Nations, 
the  period  for  any  signs  of  resemblance  in  rates  of  growth,  dates 
of  maxima  and  minima,  or  synchronism  in  any  changes.  The 
methods  by  which  such  comparisons  are  made  are  difficult,  and 
need  careful  analysis.  For  instance,  we  may  wish  to  consider 
whether  an  increase  of  the  allowance  for  outdoor  relief  is  con¬ 
nected  with  an  increase  of  pauperism.  In  this  case  one  line 
will  represent  money,  the  other  the  number  of  persons,  and  there 
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is  no  common  unit ;  we  need  not  calculate  percentages,  but 
having  chosen  any  scale  for  money,  we  can  make  equality  in 
any  year  by  a  simple  adaptation  of  the  scale  for  number.  We 
shall  wish  to  learn  first,  whether  an  increase  or  decrease  of 
money  occurred  at,  or  just  before,  an  increase  or  decrease  in 
number;  and  secondly,  whether  the  greater  the  increase  of 
one  the  greater  the  increase  of  the  other.  In  order  to  show 
direct  connection,  we  shall  try  to  make  one  line  lie  as  nearly  as 
possible  over  the  other. 

Draw  a  preliminary  diagram  in  which  both  lines  are  entered 
on  any  scales ;  this  will  suggest  the  resemblances  to  be  tested. 

Notice  in  what  period  the  fluctuations  are  greatest , 
this  in  general  should  be  the  period  to  be  taken, 
for  it  is  here  that  the  causal  relations  have  had  most  play. 
If  any  other  period  is  chosen  for  any  special  reasons,  these 
should  be  made  clear,  for  otherwise  a  critic  may  legitimately 
object  that  it  is  only  in  this  period  that  the  connection  is 
distinct.  There  would  be  little  difficulty  in  finding  short 
periods  in  any  two  curves  where  the  fluctuations  synchronised. 
Take  the  averages  of  both  money  and  of  number  over  t’.i3 
period  chosen,  and  draw  a  second  diagram  in  which  the  scale 
for  number  is  chosen  by  making  this  average  for  number  equal 
to  the  corresponding  average  for  money.  Any  correspondence 
between  the  two  lines  can  be  at  once  detected. 

There  are  many  cases  when  the  changes  in  the  magnitud 
which  we  regard  as  the  causes  are  in  the  opposite  direction 
inverse  to  those  in  the  magnitudes  which  we  regard  as  the 
relations.  effects.  For  instance,  if  we  are  comparing  trade 
improvement  with  the  number  of  unemployed,  and  make  the 
construction  just  described,  the  maxima  of  the  first  line  would 
synchronise  with  the  minima  of  the  second.  Greater  clearness 
can  be  obtained  by  inverting  one  of  the  diagrams,  plotting  out 
the  number  employed  instead  of  that  unemployed,  and  then  the 
changes  should  be  in  the  same  sense  in  both  lines. 

In  the  above  construction  the  lines  will  only  lie  one  over  the 
other  throughout  their  fluctuations,  if  the  changes  in  one 
More  complex  quantity  are  in  strict  proportion  to  the  changes 
relations.  jn  the  other,  if  an  increase  of  io  per  cent,  above  the 
average,  for  instance,  in  the  allowance  for  outdoor  relief  corre¬ 
sponded  to  one  of  io  per  cent,  in  the  number  of  paupers.  It 
is  very  rare  that  such  a  simple  relation  is  found ;  all  we  can  sees 
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Imports  &  Exports  per  head. _ _ 

Price  of  Wheat  per  quarter. _ 


Figure  I. 


To  face  page  155. 


Seale  for  Wheat  prices 


MARRIAGE  RATE 


Figure  II. 


AND  IMPORTS  AND  EXPORTS  PER  HEAD. 

Averages  1869  94  on  same  line. 

Same  base  line. 


MARRIAGE  RATE  AND  IMPORTS  AND  EXPORTS  PER  HEAD. 
Averages  1869-96  on  same  line. 

Figure  III.  Mean  deviations  (1869-96)  from  averages  equal. 
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in  general  is  that  the  maxima  and  minima  occur  at  the  same 
dates,  that  the  fluctuations  agree  throughout  in  sense  in  both 
series,  and  that  the  greater  fluctuations  in  the  one  correspond 
to  the  greater  fluctuations  in  the  other. 

Diagrams  may  often  be  used  to  suggest  correlation  between 
two  series  of  figures,  and  this  indeed  is  one  of  their  chief  merits, 
and  they  may  be  used  to  illustrate  arguments  on  Useof 
the  subject,  but  at  this  point  their  utility  ends,  for  diagrams, 
they  cannot  be  made  to  prove  much.  Causal  relations  are  very 
difficult  to  establish,  and  the  original  figures  must  be  critically 
consulted  when  theories  are  to  be  brought  to  the  test. 

We  have  not  yet  exhausted  the  power  of  diagrams  for 
miking  such  comparisons,  but  the  following  method  must  be 
applied  only  with  great  caution.  Suppose  that  we  MorC  exact 
wish  to  ascertain  whether  an  increase  of  1  bushel  in  method, 
the  quantity  of  wheat  to  be  bought  for  a  sovereign  corresponded 
to  an  increase  of  1*5  in  the  marriage  rate  per  1000,  or  any 
such  strict  numerical  proportion.  Draw  a  diagram  representing 
the  quantities  of  wheat,  take  the  average  for  the  period  chosen 
for  comparison,  and  write  the  scale  so  as  to  read  1,  2,  3  .  .  . 
bushels  above  or  below  the  average.  Draw  no  base  line.  Now 
enter  a  line  to  represent  the  excess  or  defect  of  the  marriage 
1  rate  from  its  average  in  the  chosen  period,  on  a  scale  such  that 
1*5  in  excess  is  represented  by  the  same  vertical  distance  as 
1  bushel.  The  closeness  of  the  two  lines  would  test  to  what 
extent  the  theory  was  valid.  The  danger  of  this  method  is, 
that  with  no  base  line  there  is  no  possibility  of  judging  the 
amounts  of  the  changes  relative  to  the  totals.  The  insertion 
of  the  necessary  two  base  lines  would  confuse  rather  than 
aid. 

It  is  clear  from  the  preceding  analysis  that,  by  the  choice 
of  scales  and  base  lines,  the  points  at  any  two  dates  may  be 
made  to  coincide  on  any  number  of  accurately  drawn  lines 
representing  series  of  figures. 

The  preceding  paragraphs  are  completely  illustrated  by  the 
adjoining  diagram. 

In  Figure  I  are  given  lines  representing  the  price  of  wheat  in 
shillings  per  quarter,  the  total  of  values  of  exports  and  imports 
divided  by  the  population,  and  the  marriage  rate  Illustration  of 
per  1000.  The  scales  chosen  are  simply  those  method. 

which  are  easiest  to  use,  and  throw  the  lines  into  proper  relief. 
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Marriage  Rate,  Total  Exports  and  Imports  per  Head  of 
Population,  and  Average  Price  of  Wheat  per  Quarter. 


Year. 

Marriage 

Rate. 

Total  Exports 
and  Imports 
per  Head. 

Average  Price 
of  Wheat 
per  Quarter. 

£  s. 

d. 

s.  d. 

i860 

I7.I 

13  O 

8 

53  3 

l86l 

16.3 

13  O 

3 

55  4 

1862 

l6. 1 

13  8 

0 

55  5 

1863 

16.8 

15  2 

7 

44  9 

1864 

17.2 

16  8 

7 

40  2 

1865 

17-5 

16  7 

5 

41  10 

1866 

17-5 

1 7  14 

5 

49  11 

1867 

16.5 

16  9 

6 

64  5 

1868 

16. 1 

17  0 

6 

63  9 

1869 

15-9 

17  3 

9 

48  2 

1870 

16. 1 

17  10 

3 

46  10 

1871 

16.7 

19  9 

6 

56  8 

1872 

17.4 

21  0 

0 

57  0 

1873 

17.6 

21  4 

2 

58  8 

1874 

17.0 

20  11 

0 

55  8 

1875 

16.7 

19  19 

4 

45  2 

1876 

16.5 

19  0 

10 

46  2 

1877 

15-7 

19  5 

5 

56  9 

1878 

15.2 

18  2 

1 

46  5 

1879 

14.4 

17  16 

10 

43  10 

1880 

14.9 

20  3 

3 

44  4 

l88l 

i5- 1 

19  17 

5 

45  4 

1882 

15-5 

20  8 

10 

45  1 

1883 

IS*5 

20  13 

2 

4i  7 

1884 

15- 1 

19  4 

1 

35  8 

1885 

14-5 

17  16 

9 

32  10 

1886 

14.2 

17  0 

10 

31  0 

1887 

14.4 

18  11 

7 

32  6 

1888 

14.4 

HH 

00 

t-H 

to 

1 

31  10 

1889 

15.0 

19  19 

9 

29  9 

1890 

15-5 

19  19 

7 

31  11 

1891 

15.6 

19  14 

0 

37  0 

1892 

15-4 

18  15 

6 

30  3 

1893 

14.7 

1 7  14 

9 

26  4 

1894 

I5*1 

17  11 

9 

22  10 

1895 

15.0 

17  19 

3 

23  1 

1896 

15.8 

18  14 

1 

26  2 

The  points  in  each  scale  for  the  same  years  are  over  one  another, 
but  the  scales  differ.  The  base  lines  need  not  coincide. 

We  can  see  at  a  glance  whether  there  is  resemblance  between 
the  courses  of  these  figures.  There  is  at  any  rate  a  general 
Marriage  rate  correspondence  between  the  fluctuations  of  trade 
and  trade.  and  0f  marriage  rate  since  1870,  and  possibly 

earlier.  There  are  points  of  likeness  between  wheat  prices 
and  trade ;  in  1870-73  both  rise  together,  and  fall  in  1873-75  ; 
both  rise  in  1876-77,  fall  in  the  following  two  years,  and  then 
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rise  again ;  both  fall  from  1881  to  1886  and  then  rise.  There 
are  also  many  cases  in  which  the  motions  do  not  agree,  especially 
1862-64,  and  1887-89. 

If  we  look  now  at  the  price  of  wheat  and  the  marriage  rate, 
which  in  the  earlier  part  of  the  century  used  to  be  closely 
related,  the  one  rising  when  the  other  fell,  we  see  Marriage  rate 
that  there  is  no  great  resemblance  either  in  this  and  wheat, 
or  the  contrary  sense.  In  1860-62  and  in  1862-64  wheat  rose 
and  fell,  while  the  marriage  rate  fell  and  rose;  wheat  rose  in 
1865-67,  while  the  marriage  rate  was  first  stationary  and  then 
fell  a  little ;  then  it  continued  to  fall  in  1868-70,  though  wheat 
was  falling  also ;  in  1870-80  the  marriage  rate  shows  one  long, 
wheat  two  short,  fluctuations.  Since  1880,  in  years  in  which 
wheat  fell,  the  marriage  rate  in  general  fell  also  and  vice  versa. 

Let  us  consider  for  a  moment  the  possible  links  of  connec¬ 
tion  between  these  phenomena.  When  wheat  was  the  chief 
object  of  expenditure  of  the  working  class,  its  connecting 
price  was  the  chief  thing  for  them  to  consider;  links- 
and  so  when  wheat  rose  the  marriage  rate  fell.  On  the  other 
hand,  now  that  wheat  is  cheap  and  wages  higher,  a  change  in 
the  price  of  the  loaf  is  only  of  great  importance  to  a  minority ; 
it  is  now  the  general  prosperity  of  the  country,  well  indicated  by 
the  condition  of  foreign  trade,  that  raises  the  marriage  rate. 

When  exports  and  imports  are  increasing  in  value,  trade  is 
stimulated,  and  in  spite  of  rising  prices,  marriageable  people  are 
sanguine  that  the  prosperity  will  remain  and  the  prices  fall ;  but 
when  the  prices  fall,  so  do  the  profits  and  incomes,  and  marriage¬ 
able  people  are  more  prudent.  For  these  reasons  we  may 
expect  the  marriage  rate  and  foreign  trade  lines  to  resemble 
each  other* 

Now  the  increase  of  the  marriage  rate  corresponding  to  an 
inflation  of  trade,  and  an  inflation  of  trade  to  a  time  of  rising 
prices  in  general,  we  shall  find  the  price  of  wheat  in  particular, 
which  is  connected  with  the  course  of  prices  in  general,  rising 
when  trade  is  inflated  and  falling  when  it  is  depressed,  and 
therefore  rising  and  falling  with  the  marriage  rate.  But  since 
the  price  of  wheat  is  influenced  also  by  special  causes,  it  will  not 
always  correspond  to  the  state  of  trade,  and  still  less  to  the 
marriage  rate,  with  its  former  tendency  to  opposite  variations. 

There  is  no  need  then  for  surprise  that  the  curves  marriage 
rate  and  trade  correspond;  that  wheat  and  trade  correspond, 
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but  less  closely ;  and  that  wheat  and  marriage  show  a  double 
tendency.  The  correspondence  between  marriage  and  trade  is 
investigated  on  the  diagram.  That  between  wheat  and  trade 
should  be  done  on  an  identical  method.  Marriage  and  wheat 
should  be  compared  twice  on  different  plans  :  first  for  direct 
correspondence,  and  then  by  redrawing  the  wheat  curve  with 
its  base  line  at  the  top  for  inverse  correspondence. 

To  effect  the  comparison  between  the  course  of  trade  and 
the  marriage  rate,  the  following  steps  are  taken.  On  examining 
construction  of  the  two  curves  on  the  first  figure,  it  is  seen  that 

diagram.  the  resemblance  does  not  begin  before  1869 ; 
the  parts  of  the  curves  since  1869  should  therefore  be  brought 
into  close  correspondence.  The  average  marriage  rate,  1869-96, 
is  15*5,  and  average  imports  and  exports  per  head,  £19.  The 
marriage  curve  is  drawn  in  the  ordinary  way;  then  with  the 
help  of  a  sliding  scale  the  trade  curve  is  put  in,  so  that  with 
the  same  base  line  £19  falls  on  the  15*5  line  in  Figure  II. 

The  result  is  that  the  curves  are  seen  to  rise  and  fall  at  the 
same  dates,  but  not  to  the  same  extent ;  for,  while  the  lines 
keep  nearly  parallel  from  1873  to  1879,  the  falls  from  the 
maximum  being  equal,  after  1879  the  trade  line  fluctuates 
further  above  and  below  its  average  than  the  marriage  rate  does. 

It  remains  to  test  graphically  whether  the  changes  are 
proportional  to  one  another.  An  equation  of  scales  may  be 
Final  obtained  by  equating  the  mean  deviation  (£1*04) 
comparison.  0f  imports  and  exports  from  their  average  1869-96 
with  the  mean  deviation  of  the  marriage  rate  (72)  from  its 
average  in  the  same  period ;  or  roughly  taking  the  same  vertical 
scale  to  represent  £1  of  imports  and  *7  in  the  marriage  rate. 
This  is  making  the  hypothesis  that  a  change  of  £1  in  the  total 
trade  per  head  synchronises  with  a  change  of  *7  in  the  marriage 
rate  per  thousand.  The  scales  so  chosen  are  marked  above 
and  below  the  common  average  line  in  Figure  III. 

It  is  now  seen  that  the  fluctuations  since  1870  lie  more 
closely  together  in  the  two  curves,  but  that  this  closeness  has 
been  obtained  by  the  partial  sacrifice  of  the  years  before  1870. 
A  yet  shorter  period,  1879-93,  would  show  a  very  close  agree¬ 
ment  ;  but  so  special  a  selection  would  vitiate  any  general 
argument. 

Our  conclusion  is,  that  since  1870  the  causes  which  affect 
foreign  trade  have  also  affected  the  marriage  rate  at  the  same 
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dates  and  in  the  same  sense,  and  that  the  more  marked  the 
effects  on  the  one,  the  more  marked  are  the  effects  on  the  other 
also,  but  that  there  is  no  law  of  simple  proportion  between  them. 

Instead  of  making  comparison  of  the  deviations  from  the 
average  of  a  period,  it  is  legitimate  and  often  advantageous 
to  measure  the  deviations  from  a  smooth  curve,  whether 
obtained  from  moving  averages  or  by  some  other  method. 
We  are  then  ignoring  the  causes  which  have  a  gradual  and 
permanent  effect,  and  comparing  the  short-period  fluctuations. 
We  return  to  this  subject  below  (Part  II,  end  of  Chap.  VI). 

Note. — The  relations  tested  in  Figure  II  may  be  represented 
by  the  equation  -  =  and  in  Figure  III  by - ^  =  c  (a  con¬ 

stant),  where  x  and  y  stand  for  the  value  of  trade  and  the 
marriage  rate,  and  a  and  b  for  their  average  values,  and  c  is 
chosen  so  as  to  make  the  average  fluctuations  of  the  two  sets 
of  quantities  equal.  By  the  method  of  least  squares  c  could 
be  chosen  so  that  the  correspondence  should  be  closer  than 
with  the  value  given  by  the  calculation  in  the  text. 

4.  Periodic  Figures. 

We  now  come  to  the  consideration  of  periodic  figures  ;  that 
is,  of  figures  which  within  a  given  period,  in  a  year  for  instance 
when  returns  are  monthly,  reach  maxima  and  Periodic  figures 
minima  at  assigned  times,  and  show  fluctuations 
recurring  with  regularity  in  successive  periods.  In  physical 
phenomena,  such  as  the  sunrise,  the  same  daily  numbers  will 
represent  the  phenomena,  almost  without  change,  year  after 
year.  In  the  case  of  the  tides  we  find  a  link  between  the 
more  rigid  annual  curves  of  seasonal  phenomena,  and  the  less 
marked  periods  of  social  statistics ;  for  the  tides  are  subject  to 
separate  influences  with  periods  of  24  hours,  24  hours  50  min., 
29  days,  1  year,  and  others,  and  the  effects  of  these  influences 
are  often  masked  one  by  the  other.  In  the  weekly  figures  of 
the  Bank  of  England,  Jevons  discovered  monthly,  quarterly, 
and  annual  periods.* 

In  social  and  industrial  statistics  we  usually  find  an  annual 
period,  combined  with  a  general  slow  movement  upwards  or 


*  See  Investigations  in  Currency  and  Finance. 
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downwards,  and  confused  by  an  irregular  period  of  about  ten 
years,  due  to  alternate  inflation  and  depression  of  trade.  The 
influences  of  these  three  movements  on  the  resulting  numbers 
can  be  investigated,  and  the  general  methods  of  examining 
periodic  figures  fully  explained  by  the  complete  discussion  of  one 
example,  viz.,  the  monthly  returns  of  want  of  employment  of 
the  Friendly  Society  of  Ironfounders.  For  another  example  the 
reader  is  referred  to  Jevons’  essay,  On  the  Frequent  Autumnal 
Pressure  in  the  Money  Market ;  *  and  for  an  exercise,  to  the 
monthly  gazette  wheat  prices,  where  the  gradual  change  of  the 
shape  of  the  annual  diagram  can  be  traced  in  relation  with 
the  increasing  influence  of  harvests  in  all  the  quarters  of  the 
globe. 

These  figures  are  specially  suitable  for  showing  graphically 
a  double  period,  and  the  influences  of  rapid  annual  fluctuations 
General  features  and  general  movements  of  longer  period  on  each 
of  the  figures,  other.  Looking  at  the  table  on  p.  161  along  the 
lines  for  the  several  years,  we  shall  see  that  there  is  always  a 
fall  in  the  middle  of  the  year.  Looking  down  a  vertical  column 
under  any  month,  it  will  be  seen  that  there  is  no  generally 
marked  tendency  towards  increase  or  diminution,  for  high  and 
low  numbers  occur  in  the  first  as  well  as  the  last  few  years. 
The  most  noticeable  feature  of  these  figures  is  the  alternation 
of  groups  of  years  of  high  and  of  low  numbers.  Percentages 
above  io  will  be  found  in  1857-58,  1861-63,  1866-70,  1876-81, 
1884-87,  and  1892-93.  Let  us  choose  for  examination  the 
period  1866-70.  The  figure  for  January  1866  is  below  the 
Januaries  of  previous  years;  those  of  February,  March,  and 
April  are  also  low;  from  May  to  September  the  figures  are 
greater  than  those  of  1865  or  1864 ;  from  October  to  December 
they  are  greater  than  those  of  1863, 1864,  or  1865 ;  in  December 
1867  they  are  greater  than  any  previous  year.  Most  of  the 
figures  for  1868  are  higher  than  in  the  nine  previous  years; 
but  from  September  1868  onwards  the  figure  is  lower  than  the 
one  twelve  months  earlier  till  September  1872.  This  wave 
of  unemployment  then  lasted  from  May  1866  to  September 
1872. 

Now  let  us  watch  the  seasonal  influence.  In  1866  there 
was  no  fall  in  the  summer  except  in  April,  and  there  was  a  very 


*  See  Investigations  in  Currency  and  Finance. 
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PERIODIC  FIGURES. 

Number  of  Unemployed  Ironfounders,  expressed  as  percentages 
of  estimated  total  number  of  members,  month  by  month  :  calculated 
from  figures  given  in  the  Annual  Report  of  the  Friendly  Society  of 
Ironfounders,  1894. 


Year. 

Jan. 

Feb. 

Mar. 

April.' 

May. 

June. 

July. 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Aver¬ 
age  for 
Year. 

1855 

II. I 

I4.I 

I4.0 

12.5 

10.0 

9-9 

8.7 

8.7 

6.8 

7-7 

8.8 

12.0 

IO.4 

1856 

IO.9 

12.6 

12.2 

10.  O 

9.4 

7-5 

6.9 

7-3 

6.9 

8.1 

8.7 

9.9 

9-2 

1857 

IO.  I 

9-5 

8.7 

8.7 

8. 1 

7-3 

6.8 

6.9 

6.2 

8.0 

14.0 

17.7 

9-3 

1858 

20.2 

20.6 

20.9 

19.8 

20.3 

17.8 

15-9 

14-3 

13*  1 

11. 9 

11. 5 

II. 2 

16.5 

1859 

10.6 

8.8 

6.5 

5-2 

4.0 

4.4 

3-2 

3-6 

3-4 

3-8 

4.6 

5-i 

5-3 

i860 

4.0 

3-2 

2.6 

2.2 

1.6 

1  -7 

2-3 

2.6 

2.6 

2.9 

3-7 

5-6 

2.9 

l86l 

6.0 

6.9 

6-5 

7-9 

7.8 

8.4 

69 

7-9 

9-5 

10.7 

12.4 

13.8 

8.7 

1862 

14.5 

14.0 

I4.O 

14.6 

14.4 

13-7 

13-3 

12.9 

12.2 

13-5 

14.9 

16.0 

14  0 

1863 

15-5 

13-9 

I3.6 

11. 6 

10.4 

9-3 

8.1 

7.8 

7-4 

6.6 

5-3 

5-0 

9  5 

1864 

6.0 

7-i 

6.6 

5-i 

4.4 

3-3 

2.8 

2.8 

2.6 

3-3 

4.2 

8.1 

4-7 

1865 

5-4 

5-3 

5-3 

4.6 

3-4 

2.9 

2.6 

3-i 

2.7 

2.6 

2.3 

4.9 

3-8 

1866 

4.2 

5-4 

3-6 

5-i 

6-5 

5-9 

6-5 

6.9 

7-4 

9-3 

138 

6.7 

1867 

12.4 

13.2 

15-4 

16.7 

14.9 

14.6 

14.2 

13-9 

15-7 

16.3 

18.9 

22.6 

15-7 

1868 

22. 1 

20.9 

19.8 

18.6 

16.7 

15.8 

14.9 

14.7 

14.2 

14. 1 

15.6 

17.4 

17. 1 

1869 

17.3 

17.1 

16.8 

15.6 

15.2 

13.6 

13-3 

11. 8 

13- 1 

13.6 

14.8 

15-3 

14.8 

1870 

14-5 

10.9 

8.7 

7.2 

5-o 

4-5 

3-7 

4-5 

4-9 

5-o 

5-6 

8.3 

6.9 

1871 

7.2 

5-6 

3- 6 

2.8 

1.6 

i-5 

1.6 

1.2 

•9 

1.4 

1. 1 

2.2 

2.6 

1872 

1. 1 

1. 1 

•9 

.8 

1.2 

•  7 

•9 

1.0 

i-3 

1.8 

2.6 

4.1 

15 

1873 

3-3 

2.8 

2.7 

2-5 

2.1 

2.0 

3-o 

4-9 

4-3 

3-3 

3-3 

5-i 

3-3 

1  Average 

8.9 

8.2 

8-5 

8.6 

1855-73 

10.3 

10.2 

9  7 

7-7 

7-i 

7.2 

7-1 

7-5 

10.4 

1874 

4-9 

3*9 

3-9 

3-5 

4.9 

3-9 

3-8 

3-4 

3-5 

3-7 

3-9 

5-o 

4.0 

1875 

4.6 

3-4 

3-5 

2.8 

2.8 

2.8 

3-3 

3-4 

3*6 

4.1 

4.1 

5-o 

3-6 

1876 

4.9 

4.9 

4.9 

5-4 

4.8 

5-2 

5-7 

5.8 

6.4 

6.4 

6.2 

10.3 

5-9 

1877 

7-7 

7*4 

7.0 

6.9 

8.4 

7.6 

7-4 

7.8 

9.6 

10.9 

12.3 

16.3 

9.1 

1878 

14.0 

14-3 

i3-5 

15-3 

13-3 

14.6 

13.6 

13.2 

13-3 

14.0 

15-7 

21.0 

14.7 

1879 

23.2 

23.8 

24.7 

25-5 

22.3 

23-4 

21.5 

22.6 

22.5 

21. 1 

18.0 

16.6 

22.1 

l88o 

I5-2 

12.9 

11. 1 

10.0 

10.0 

9-7 

9.8 

10.0 

10.0 

9.2 

9-2 

10.2 

10.6 

l88l 

n-5 

10.8 

10. 1 

10. 1 

7.6 

7-5 

6.5 

5-8 

5-6 

5-4 

5-o 

6.6 

7-7 

1882 

5-5 

5*2 

5-3 

4-5 

3-6 

3-8 

3-2 

3-4 

3-6 

4.1 

4.4 

6.0 

4.4 

1883 

3-6 

4.8 

5-2 

4-3 

4.2 

3-6 

3-9 

4-3 

4-3 

4.2 

4.0 

6.6 

4.4 

1884 

6.1 

6.2 

5-9 

6.5 

6-5 

6.9 

6-5 

7.6 

8.1 

7.8 

9.8 

10.9 

7-4 

1885 

10.2 

11. 1 

10. 0 

10. 1 

9.8 

9.1 

9.8 

10.7 

11. 8 

11. 6 

12.7 

13.6 

10.9 

1886 

14. 1 

15.0 

15.2 

IS-5 

13-4 

13- 1 

12. 1 

12.7 

13.6 

13-9 

12.7 

12.9 

13-7 

1887 

12.4 

11. 6 

10.2 

9.1 

9.2 

10.6 

9.2 

8.8 

9.6 

9.4 

9-4 

9-1 

9.9 

1888 

7.8 

7-5 

6.4 

6.4 

5-9 

5-2 

5-7 

5-o 

5-i 

4.8 

3-2 

3-5 

5-5 

1889 

1890 

3-i 

3-3 

2.4 

2. 1 

i-7 

1.6 

i-7 

i-7 

1.6 

i-5 

1.2 

1-4 

1.9 

i-3 

i.3 

3-2 

3-i 

2.8 

2.4 

2.4 

2.7 

2.7 

2.7 

2-7 

2-7 

2-5 

1891 

3-9 

3-5 

4.2 

4.2 

4.6 

4.0 

4-5 

4.8 

5-4 

5-6 

5-7 

6-3 

47 

1892 

7.0 

7.2 

7-9 

8. 1 

7-9 

7-9 

7-7 

7.6 

9-3 

11. 4 

10.9 

12.0 

8.7 

1893 

1 1*5 

11. 2 

10. 1 

7-7 

9.6 

8-3 

8-3 

9.2 

11. 7 

11. 9 

11. 5 

n*5 

10.2 

Average 

7.6 

8.1 

8.1 

8.1 

1874-93 

8.6 

8-5 

8.2 

8.1 

7-7 

7-3 

7-5 

8.2 

9.4 

Average 

7.6 

8-3 

1855-93 

9  4 

9  3 

8.9 

_ 

8-5 

7-9 

7.6 

7.2 

7-4 

7-9 

9.9 

8-3 

M* 
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rapid  rise  in  December.  In  1867  a  fall  from  April  to 
August  was  followed  by  a  rapid  rise  for  four  months.  There 
seasonal  is  a  fall  from  December  1867  to  September  1868, 
influence.  but  a  rise  f0u0WS  in  October,  November,  and 
December;  since  the  rise  does  not  generally  begin  till  after 
August,  it  will  be  seen  that  the  general  fall  did  not  much  delay 
the  seasonal  effect.  In  the  next  year,  1869,  there  is  a  fall  to 
a  lower  minimum  in  August,  but  now  the  rise  in  December 
is  very  slight,  next  year  the  fall  is  very  quick  to  August,  but 
the  seasonal  rise  is  not  delayed.  From  this  it  is  clear  that  the 
seasons  had  their  effect  throughout  the  fluctuation  except  in 
the  opening  year  1866,  when  there  was  no  fall,  and  that  the 
rises  in  the  autumn  were  very  much  accentuated.  Almost 
identical  remarks  would  apply  to  the  period  August  1875  to 
May  1881.  In  what  month  was  the  condition  of  employment 
1867-70  at  its  worst  ?  The  greatest  figure  given  is  22*6  per 
cent,  in  December  1867,  but  unemployment  in  December  is 
generally  greater  than  in  any  other  month,  and  the  figures 
for  any  of  the  following  six  months  may  be  more  unusual; 
the  determination  of  the  exact  date  will  be  best  shown  by 
diagrams.  It  may  be  mentioned  that  most  of  these  remarks 
were  suggested  by  Mr.  Hey,  the  former  secretary  of  the  Iron- 
founders’  Society,  who  drew  up  these  figures. 

If  we  now  turn  to  the  diagram,  the  following  facts  may  be 
noticed.  The  thick  line  showing  the  annual  average  percent  - 
The  story  from  ages  shows  a  downward  tendency  till  1857,  fol- 
the  diagram,  lowed  by  an  abrupt  rise  and  fall  in  1858-60,  then 
two  years’  rise  to  its  original  height,  returning  to  a  minimum 
in  1865  ;  the  next  wave  covers  seven  years,  and  is  marked  by  an 
extraordinarily  sharp  rise  in  1867,  and  a  very  low  minimum  in 
1872.  The  exceptional  condition  of  trade  in  1872  could  not 
last,  but  the  rise  is  very  gradual  to  1876,  when  the  next  cycle 
of  trade  is  marked  again  by  a  six  years’  wave;  the  rise  is 
not  so  steep  as  in  the  former  fluctuation,  but  lasts  longer,  and 
a  higher  point  is  reached  :  the  fall  is  at  about  the  same  angle, 
and  the  minimum  in  1882  is  about  the  same  as  that  in  1865. 
The  next  wave  came  before  it  appeared  to  be  due,  and  lasted 
seven  instead  of  six  years,  but  was  much  more  moderate,  and 
again  the  rise  was  sharper  than  the  fall.  The  minimum  of 
1889  did  not  endure,  and  the  figure  ends  with  a  suggestion 
that  the  maximum  will  be  in  1894,  but  only  at  a  moderate 
height,  and  the  next  minimum  might  be  expected  in  1898 
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Figure  5.  Average  year  by  year  and  smooth  curve  (7  yearly  averages). 
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or  1899,  if  causes  similar  to  those  which  influenced  earlier  trade 
depressions  were  still  acting.  It  may  be  found,  in  fact,  from 
the  Board  of  Trade  returns,  that,  taking  all  the  trade  unions  who 
made  returns  together,  the  maximum  month  was  December 
1892,  and  the  maximum  year  was  1893 ;  after  this  the  fall  is 
regular  to  1897,  and  a  trifling  rise  in  1898  is  followed  by  a  very 
low  figure  for  1899.* 

In  Figure  5  the  diagram  is  inverted  and  greatly  compressed, 
showing  now  the  percentage  employed.  If  the  period  1876-82 
is  cut  off  by  two  vertical  lines,  readers  may  see  how  great  were 
the  amounts  of  labour  lost  to  the  country  and  wages  to  the 
members  of  the  Ironfounders’  Society  in  those  years.  These 
figures  show  a  want  of  employment  due  to  special  causes  in  this 
Society  more  than  twice  as  great  as  in  other  Unions  whose 
returns  are  available  for  the  same  period. 

In  Figure  5  the  annual  averages  are  smoothed  by  the  method 
explained  above  (pp.  136-7),  a  seven-yearly  average  f  being 
taken  to  correspond  to  the  general  wave  length.  It  will  be 
seen  that  there  is  no  very  marked  tendency  up  or  down  in  the 
thirty-nine  years,  and  that  the  smooth  line  is  never  far  from 
the  general  average  of  employment,  91 7. 

The  comparison  of  this  diagram  with  that  illustrating 
1  exports  (p.  134)  is  very  instructive.  Some  of  the  results  may 
be  thus  exhibited  : — 


Minima 

Dates  of 

Maxima  of 

Maxima 

Dates  of 

Minima  of 

of  Exports. 

Unemployment. 

of  Exports. 

Unemployment. 

1862 

1858  and  1862 

l866 

1865 

1868 

1868 

1872 

I872 

1879 

1879 

1882 

1882  or  1883 

1886 

1886 

189O 

1889 

1894 

1893 

The  figures  may  also  be  compared  graphically  by  the  methods 
of  the  previous  or  following  sections. 

The  averages  for  the  nineteen  Januaries,  nineteen  Februaries, 
etc.,  in  the  years  1855-73,  and  similar  averages  Measurement 
for  the  years  1874-93,  and  the  whole  period  are  of  seasonal 
given  in  the  table  and  exhibited  in  Figures  2,  3,  4.  mfluence- 


*  See  Annual  Abstract  of  Labour  Statistics,  1895,  p.  73,  for  various 
methods  of  treating  these  figures  similar  to  those  here  discussed. 

f  For  smoothing  and  studying  periodic  curves,  see  Professor  Poynting’s 
paper  in  Statistical  Journal,  1884,  and  Professor  Moore’s  in  1919. 
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When  we  calculated  the  annual  averages  just  discussed  we 
eliminated  by  that  process  the  seasonal  fluctuations;  by  this 
new  series  of  averages  we  eliminate  the  influences  of  particular 
years.  If  we  took,  for  instance,  all  the  November  numbers  out 
of  a  series  of  figures  totally  uninfluenced  by  the  seasons,  if  such 
could  be  found,  and  compared  these  with  the  general  average 
for  all  months,  we  should  in  the  long  run  find  just  as  many 
instances  above  as  below  this  average ;  but  if  the  figures  were 
influenced  by  the  seasons,  we  should  find  a  considerably  greater 
number  above  than  below,  or  vice  versa.  The  greater  the 
seasonal  influence,  the  greater  would  be  this  excess  or  defect. 
Averaging  numbers  in  this  way  tends  to  eliminate  the  non- 
seasonal  causes,  for  by  hypothesis  the  excesses  and  defects  due 
to  them  will  in  the  long  run  balance  one  another ;  and  except 
by  averaging  these  cannot  be  eliminated,  unless  they  can  be 
actually  calculated.  The  excess  of  the  November  average 
above  the  general  average  will  be  greater  than  that  of  October, 
if  the  seasonal  causes  exert  more  influence  towards  excess  in  the 
former  than  in  the  latter  month,  and  the  curve  which  shows 
these  averages  will  show  a  resemblance  to  that  which  would 
be  obtained  if  the  non-seasonal  causes  were  absent.  It  will 
be  only  a  resemblance  for  two  reasons  :  first,  because  in  the 
comparatively  short  series  of  years  with  which  we  are  generally 
obliged  to  be  content,  a  very  effective  non-seasonal  cause  will 
leave  its  mark  on  the  average,  as  may  be  seen  in  the  table  on 
p.  161 ;  secondly,  because  seasonal  and  non-seasonal  causes  are 
often  not  independent ;  a  depression  of  trade  is  accentuated  by 
a  sharp  winter ;  a  bad  season  in  a  year  of  bad  trade  may  increase 
the  want  of  employment  greatly  and  suddenly,  while  a  good 
summer  in  a  prosperous  year  may  reduce  it  almost  to  zero. 
In  the  case  we  are  considering  the  interaction  of  causes 
tends  to  exaggerate  the  seasonal  maximum  and  diminish  the 
minimum;  in  other  cases  a  compensating  effect  might  be 
found. 

In  Figures  2,  3,  4  the  curve  for  the  latter  half  of  the  year 
is  prefixed  to  that  of  the  calendar  year,  because  the  character 
of  the  yearly  waves  is  seen  most  clearly  from  minimum  to 
minimum.  It  may  be  noticed  that  the  wave  in  Figure  3  is 
less  definite  in  shape  and  has  a  smaller  rise  and  fall  than  that 
of  the  earlier  period  shown  in  Figure  2 ;  it  would  appear  that 
the  seasons  are  losing  their  influence. 

If  there  is  a  definite  annual  period,  that  represented  by 
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Figure  4,  it  may  be  expected  that  a  figure  of  a  shape  similar 
to  this — 


will  be  rep  cated  annually  in  Figure  1 ;  it  is  shown  well  in  1864, 
1882,  and  other  years.  In  the  great  majority  of  cases  the  yearly 
maximum  is  reached  in  December  or  January ;  at  The  annual 
the  end  of  1858  the  maximum  is  absent,  but  is  wave- 
replaced  by  a  break  in  the  rapidity  of  the  fall ;  at  the  end 
of  i860  there  is  a  rise,  but  the  spring  fall  following  is  checked 
by  the  general  upward  trend;  similar  remarks  apply  to  all 
the  great  fluctuations.  There  is  no  doubt  that  right  along  the 
line  we  find  at  nearly  equal  intervals  these  pointed  crests  above 
the  line  of  averages. 

The  minima  are  not  so  conspicuous,  for  the  pointed  shape 
is  absent,  trifling  causes  bring  them  near  the  smoothed  line,  and 
they  are  easily  masked  by  a  general  fall  or  are  absent  because 
of  a  general  rise.  In  1861,  however,  there  is  a  distinct  minimum 
in  spite  of  the  strong  upward  tendency ;  the  minima  are  very 
conspicuous  throughout  the  fluctuation  of  1865-70 ;  and  from 
t  1859  to  1888  the  minima  are  fairly  marked,  except  in  1876, 
1880,  and  1881. 

The  following  figures  show  the  effect  of  a  stationarjq  rising, 
and  falling  average  annual  rate  on  the  shape  of  the  seasonal 
wave  : — 

a.  Seasonal  wave  on  stationary  line  of  averages. 

5  S 


Jan.  Dec.  [  Jan.  Dec. 


a.  Seasonal  wave  on  stationary  line  of  averages. 


b.  Seasonal  wave  superimposed  on  rising  line  of  averages. 
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Dec. 
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c.  Seasonal  wave  superimposed  on  falling  line  of  averages. 

15 

IC 


5 

o 

Jan.  Dec.  |  Jan.  Dec. 

These  figures  are  drawn  by  adding  or  subtracting  the  average 
monthly  differences  from  the  general  average 

•  Jan.  Feb.  Mar.  Apr.  May.  June.  July.  Aug.  Sept.  Oct.  Nov.  Dec.  \ 
■  +1*1  +1-0  +'6  +-2  —'4  —7  —1*1  —'9  —7  —‘4  0  +  1’6/ 

month  by  month  to  or  from  the  positions  shown  on  the  straight 

lines  joining  the  annual  averages.  On  a  rising  line  the  spring 

fall  tends  to  become  horizontal  and  the  autumn  rise  steeper; 

on  a  falling  line  the  spring  fall  becomes  more  rapid  and  the 

autumn  rise  is  checked. 

If  this  seasonal  wave,  added  to  the  slower  long-period 
changes,  were  the  complete  explanation  of  these  numbers, 
Figure  1  (p.  162)  would  be  entirely  composed  of  modifications 
of  Figures  a ,  b,  and  c.  Figure  a  is  exemplified  especially  in 
1855-57,  1864-65,  1871-73;  Figure  b  in  1860-61,  1866-67, 
1:877-78,  1883-85 ;  Figure  c  in  1859,  1863,  1880-82,  1886-89. 

As  explained  above,  the  two  sets  of  causes  are  not  indepen¬ 
dent,  and  these  figures  are  not  reproduced  exactly;  but  the 
Elimination  of  resemblance  is  sufficiently  close  to  make  the 
fluctuations,  following  method  of  eliminating  seasonal  fluctua¬ 
tions  partially  applicable.  Combine  the  monthly  excesses  and 
defects  just  given  with  the  original  numbers,  by  subtracting  the 
excesses  and  adding  the  defects ;  this  process  should  tend  to 
produce  a  straight  line  thus  : — 


5 - from  figure  i 

. corrected  figures. 

o 

But  the  result  is  not  more  than  a  tendency,  because  of  the 
unusual  fall  in  January  1883,  and  it  is  difficult  to  find  a  perfect 
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example.  This  method  is  applied  in  Figures  6,  7,  and  8  in  an 
attempt  to  disentangle  the  seasonal  fluctuations  from  the  effects 
of  the  commercial  crisis  of  1872,  the  depression  of  1879,  and  the 
turn  of  the  tide  in  1883.  In  Figure  6  it  is  seen  that  January  1872 
was  the  best  month  relatively,  though  the  absolute  minimum 
was  not  reached  till  June  of  that  year ;  from  this  it  appears  that 
January  1872  was  the  turning  point  of  the  great  inflation,  a  date 
somewhat  earlier  than  that  generally  given.  The  date  of  the 
maximum  of  1879  is  left  unchanged  by  this  process,  and  that  of 
the  1889  minimum  is  only  shifted  one  month. 

We  have  still  to  discuss  the  criteria  of  the  existence  of  a 
period.  In  Figure  1  the  optical  evidence  is  sufficient  to  suggest 
the  annual  period,  but  it  may  be  doubted  whether  criteria  of  exist- 
an  annual  fluctuation  would  be  suggested  by  a  en«  of  period, 
diagram  representing  wheat  prices.  It  is  clear  that  if  the 
monthly  entries  of  any  returns  whatever  were  averaged  in 
months  over  any  period  of  years,  that  the  averages  for  January, 
February,  etc.,  would  not  be  exactly  equal,  even  if  there  were 
no  seasonal  influence.  The  following  diagrams  show  various 
averages  : — 


IO 


5 


o 


Unemployed  ironfounders 
as  before. 


10 


June 


Wheat  prices,  shillings 
per  quarter,  1862-76. 


Jan.  Dec. 


Wheat  prices  shillings  per  Average  date  of  first  Sunday 

quarter,  1877-91.  in  month,  1881-1900. 
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Of  these  the  first  three  may  be  expected  to  be  seasonal,  while 
the  last,  which  shows  the  averages  of  the  dates  on  which  fell  the 
first  Sunday  in  20  Januaries,  20  Februaries,  etc.,  in  a  series  of 
years,  certainly  is  not. 

The  following  simple  tests  may  be  applied  to  decide  this 
point.  If  the  period  is  in  any  way  connected  with  the  seasons, 
it  will  correspond  to  some  extent  to  the  ordinary  weather  charts 
of  temperature,  etc.,  which  have  a  single  annual  maximum  and 
corresponding  minimum.  Phenomena  affected  by  the  weather 
may  also  be  expected  to  show  a  single  maximum,  nearly  coin¬ 
ciding  with  the  maximum  or  minimum  temperature ;  thus  the 
maximum  unemployed  coincides  with  the  minimum  length  of 
daylight  and  precedes  the  minimum  temperature.  In  some 
cases  a  second  subsidiary  maximum  may  be  shown,  since,  for 
example,  an  excessive  death  rate  may  be  due  to  excessive  cold 
or  heat ;  but  even  in  this  example  further  analysis  would  prob¬ 
ably  show  that  the  one  maximum  was  for  the  old,  the  other 
for  the  young.  Wheat  prices  may  also  show  two  minima  due 
to  the  harvests  in  the  two  hemispheres.  The  “  Sunday  ”  curve 
just  given  shows  four  maxima,  and  is  not  seasonal.  More  than 
one  maximum  is  evidence  against  periodicity  till  a  reason  is 
found  for  their  existence. 

The  second  test  is  to  look  at  the  serial  diagram  and  notice 
how  often  the  maximum  occurs  in  the  same  month  ;  non-periodic 

probability  causes  will  hide  the  maximum  occasionally,  but  in 
test  the  long  run  one  month  will  be  predominant.  In 
Figure  1  the  maximum  occurs  in  March  and  April  twice  each, 
in  February  three  times,  in  January  eleven  times,  and  in  Decem¬ 
ber  twenty-one  times.  The  maximum  is  then  generally  in 
midwinter.  The  minimum  is  not  in  this  case  so  well  defined. 

The  following  table  shows  how  this  analysis  can  be  ex¬ 
tended  : — 

Times 
out  of  39. 


The  percentage  of  December  is  greater  than  that 
of  the  preceding  November  -  33 

The  percentage  of  December  is  greater  than  that 
of  the  following  January  28 

The  percentage  of  December  is  greater  than  that 

of  the  preceding  July . 33 

The  percentage  of  December  is  greater  than  that 
of  the  following  July . 30 


The  chances  against  so  great  a  preponderance,  if  the  seasons 
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had  no  influence,  are  respectively  about  65,000  to  1,  160  to  1, 
65,000  to  1,  and  1200  to  1.*  All  the  months  may  be  separately 
tested  in  the  same  way.  This  method  by  no  means  exhausts 
the  evidence,  for  we  have  only  considered  which  of  two  months 
is  the  greater,  and  not  how  great  is  the  excess  when  it  exists. 
On  this  point  the  reader  is  referred  to  the  paper  by  Professor 
Edgeworth,  On  Methods  of  Statistics ,  in  the  Jubilee  Volume 
of  the  Royal  Statistical  Society,  p.  206 ;  this  should,  however, 
be  postponed  till  the  mathematical  treatment  which  follows  in 
Part  II  has  been  studied. 


5.  Logarithmic  Curves. 

A  serious  flaw  in  the  graphic  method  as  used  in  the  previous 
sections  is  that,  when  we  are  dealing  with  a  series  of  increasing 
figures,  though  the  totals  year  by  year  may  be  tor  graphic 
increasing,  we  are  compelled  to  represent  equal  representation 
increments  on  these  totals  by  equal  vertical  dis¬ 
tances  ;  thus  an  increment  of  £20  on  a  total  of  £20  is  repre¬ 
sented  by  the  same  vertical  distance  as  an  increment  of  £20  on 
a  total  of  £2000.  Thus  in  the  annexed  figure  representing 
exports,  the  fall  from  £52,000,000  to  £42,000,000  in  1815-16  is 
barely  noticeable,  though  it  is  a  fall  of  20  per  cent.,  and  was 
connected  with  very  great  distress  in  the  manufacturing  dis¬ 
tricts  ;  while  the  fall  from  £305,000,000  in  1883  to  £269,000,000 
in  1886  attracts  attention  immediately,  though  it  is  one  of 
12  per  cent.  only.  Again  the  increase  of  34  per  cent,  which 
took  place  between  1848  and  1850  appears  insignificant  in  com¬ 
parison  with  that  of  29  per  cent,  from  1870  to  1872.  When  we 
are  attacking  questions  of  causation  it  very  frequently  happens 
that  we  are  more  concerned  to  know  the  proportionate  increase 
than  the  actual  increase.  When  we  are  considering  the  gradual 
growth  of  our  foreign  trade,  or  when  we  are  comparing  the 
growth  of  trade  of  two  countries,  a  diagram  like  that  annexed 
is  likely  to  give  quite  a  wrong  impression  of  the  struggle  that 
marked  the  early  stages.  We  need  then  a  diagram  not  of 
quantities,  but  of  ratios,  where  equal  vertical  distances  represent 
no  longer  equal  absolute  increments,  but  equal  proportional 
increments,  that  is,  equal  rates  of  increase.  By  the  use  of 
logarithms  a  universal  scale  can  be  constructed  which  serves 


*  See  Part  II,  Sect.  I,  infra. 
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this  purpose.  The  non-mathematical  student  can  easily 
accustom  himself  to  the  use  of  diagrams  so  constructed,  by 
studying  one  where  the  actual  amounts  represented  are  entered, 
and  noticing  that  whatever  part  of  the  scale  he  takes,  doubling, 
halving,  increasing  by  20  per  cent,  and  so  on,  are  always  repre¬ 
sented  by  the  same  vertical  distances  respectively.  The  con- 
construction  of  struction  of  a  diagram  on  this  scale  is  as  follows  : — 
a  logarithmic  Write  down  the  numbers  in  the  series  to  be  repre¬ 
sented  ;  against  them  write  down  their  logarithms ; 
on  paper  divided  into  equal  squares  mark  at  equal  intervals  on  a 
vertical  line  numbers  ascending  in  regular  progression  so  as  to 
include  all  the  logarithms  found;  mark  off  the  dates  on  a 
horizontal  line;  and  on  the  scale  thus  prepared  mark  in 
the  logarithms,  instead  of  the  original  numbers.  The  table  on 
p.  173  and  the  diagram  facing  p.  171  show  the  figures  of  imports 
and  exports  thus  treated.  On  the  right  hand  of  figure  2  the 
position  of  the  absolute  numbers  is  given ;  on  the  left  the  corre¬ 
sponding  logarithms.  A  given  vertical  distance,  1  inch, 
represents  the  distance  .301  on  the  logarithmic  scale ;  if  we  add 
this  quantity  to  the  logarithm  of  any  number,  we  obtain  the 
logarithm  of  twice  that  number  for  log  a  +  *301  =  log  a 
+  log  2  =  log  2 a ;  for  instance,  if  we  increase  the  height  of 
the  position  which  represents  £30  by  1  inch,  we  arrive  at  the 
position  which  represents  £60.  Again  if  we  now  add  1-59  of  an 
inch,  which  represents  *477  on  the  logarithmic  scale,  that  is 
log  3,  to  the  logarithm  of  2 a,  we  obtain  log  6a,  and  we  have — 

log  6 a  —  *477  -f  log  2 a  =  *477  -f  *301  -f-  log  a,  as  above 
=  7 78  -f  log  a  =  log  6  +  log  (i ; 

that  is,  we  arrive  at  the  same  position  on  this  scale  whether  we 
go  by  means  of  two  separate  ratios  or  by  a  single  compounded 
ratio.  Thus  a  diagram  drawn  on  this  principle  satisfies  the 
necessary  conditions  that  equal  vertical  distances  represent  the 
same  ratio  in  whatever  part  of  the  scale  they  are  taken,  and 
that  any  number  of  points  can  be  entered  without  leading  to 
inconsistencies.  At  the  end  of  this  section  is  given  a  table  of 
the  logarithms  of  1  to  1000,  correct  to  the  third  decimal  place, 
which  will  be  found  sufficient  for  this  purpose. 

Thus  on  the  diagram  given  we  can  find  at  once  that  imports 
were  doubled  in  value  between  1811  and  1836,  again  between 
Examples  of  1839  and  1853,  again  between  1855  and  1866, 
its  use,  and  that  their  value  increased  40  per  cent,  be- 


Figure  I.  GROWTH  OF  IMPORTS  AND  EXPORTS  IN  THE  XIX™  CENTURY. 
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tween  1886  and  1899.  Or  we  may  notice  that  the  excess  of  the 
value  of  imports  over  that  of  exports  was  40  per  cent,  of  the 
latter  both  in  1850  and  in  1880 ;  that  the  value  of  imports  in 
1899  was  thrice  that  of  exports  in  i860. 

If  the  eye  has  been  carefully  educated  to  understand  a 
diagram  of  this  sort,  if  the  fact  that  it  is  a  diagram  of  ratios , 
not  of  quantities ,  is  firmly  impressed  on  the  mind,  then  the 
diagram  answers  perfectly  the  object  of  the  graphic  method, 
that  is,  it  gives  a  true  instantaneous  impression  of  a  complex 
series  of  facts.  If,  on  the  other  hand,  it  is  found  that  a  true 
impression  is  not  received,  through  inability  to  take  the  right 
mental  position,  then  diagrams  on  the  natural  scale  should  be 
employed  only,  always  with  the  recollection  that  they  may  give 
false  impressions  of  ratio.* 

It  is  to  be  noticed  that  no  base  line  should  be  given  in 
diagrams  of  this  class,  otherwise  a  false  impression  is  at  once 
obtained.  Notice  further  that,  while  equal  verti-  velocity  and 
cal  differences  represent  equal  ratios  from  any  acceleration. 

part  of  the  diagram  to  any  other,  instead  of  equal  increments  as 
on  the  natural  scale,  equal  degrees  of  slope  represent  equal  ratios 
of  increase  (equal  accelerations),  instead  of  equal  additions  in 
equal  times  as  on  the  natural  scale  (equal  velocities).  On  the 
logarithmic  scale  a  line  rising  with  convexity  to  the  horizontal 
shows  that  the  ratio  of  increase  is  growing,  as  in  imports  from 
1830-53  (if  the  line  is  smoothed),  while  concavity,  as  from  1854 
to  1873,  shows  a  slackening ;  but  on  the  natural  scale  the  line  is 
convex  almost  throughout  the  two  periods,  showing  that  the 
actual  increments  were  increasing  all  the  time. 

It  would  be  useful,  if  space  permitted,  to  offer  several 
diagrams  on  both  scales ;  for  in  many  series  of  figures  the 
differences  exhibited  by  the  two  methods  are  very  Useful  appli_ 
instructive.  One  case  may  be  signalised  where  the  cation  to  index- 
logarithmic  scale  is  specially  important,  that  is, 
when  the  original  numbers  represent  ratios,  not  actual  numbers. 
Thus  in  Mr.  Sauerbeck's  well-known  diagram,  drawn  on  the 
natural  scale,  representing  his  index-numbers  of  prices,  all  the 
numbers  included  are  percentages  of  their  values  in  certain 
defined  years.  Suppose  that  100,  80,  and  60  are  the  index- 

*  Professor  Marshall  suggests  a  simple  method  of  correcting  this  false 
impression  in  his  paper  On  the  Graphic  Method  of  Statistics ,  in  the  jubilee 
volume  of  the  Journal  of  the  Royal  Statistical  Society,  p.  257  seq. 
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numbers  for  three  years,  then  on  the  natural  scale  the  decre¬ 
ments  are  represented  by  equal  distances  and  appear  to  be  equal. 
The  changes  in  the  value  of  gold,  however,  are  by  no  means 
equal  in  the  two  periods.  In  the  first,  the  fall  from  100  to  80 
is  one  of  20  per  cent. ;  16s.  at  the  second  date  would  buy  goods 
which  cost  £1  at  the  first.  In  the  second,  the  fall  from  80  to  60 
is  one  of  25  per  cent. ;  15s.  at  the  last  date  would  buy  goods 
which  cost  £1  at  the  middle  date.  For  the  purposes  of  price 
index-numbers  it  is  ratios  which  are  important  and  which  the 
diagram  should  represent. 

The  logarithmic  scale  has  special  uses  in  the  comparison  of 
series  of  figures,  and  the  methods  discussed  in  the  section 

comparisons  on  devoted  to  that  subject  can  be  readily  adapted. 

the  logarithmic  The  difficulty  of  the  choice  of  units  in  comparing 
quantities  of  different  natures  disappears  when  we 
deal  only  with  ratios;  we  need  no  longer  trouble  about  the 
method  of  percentages.  In  investigating  causal  relations  we 
are  more  likely  to  find  close  connection  in  ratios  than  in 
quantities ;  for  if  one  set  of  phenomena  is  connected  with 
another,  it  is  more  likely  that  the  relation  will  be  a  propor¬ 
tional  one  (e.g.,  that  an  increase  of  10  per  cent,  in  some  measur¬ 
able  characteristic  of  the  one  corresponds  to  an  increase  of  8  per 
cent,  in  a  characteristic  of  the  other),  than  an  absolute  quanti¬ 
tative  one  (e.g.,  that  an  increase  of  2 s.  in  a  price,  at  whatever 
point  it  stands,  corresponds  to  a  decrease  of  100  in  the  number 
of  purchasers).  Resemblance  between  two  curves  on  the  loga¬ 
rithmic  scale  will  mean  the  correspondence  in  proportional 
change,  while  resemblance  on  the  natural  scale  means  corre¬ 
spondence  in  absolute  change. 

There  is  less  trouble  in  this  new  method  in  equating  averages 
than  before.  For  if  the  logarithms  of  two  series  are  taken,  it  is 
quite  immaterial  at  what  height  on  a  logarithmic  scale  the  two 
are  plotted  out ;  alteration  of  height  only  means  multiplication 
of  all  the  items  by  a  constant  quantity,  and  does  not  alter  the 
appearance  or  proportion  of  their  fluctuations.  The  method  to 
be  employed  is  as  follows  : — Draw  the  curves  representing  two 
series  of  figures  on  a  logarithmic  scale;  then  shift  the  lower 
curve  vertically  upwards  to  and  over  the  other,  till  the  closest 
possible  correspondence  is  obtained ;  draw  it  in  in  this  position, 
and  the  two  series  can  be  accurately  compared. 

This  method  is  illustrated  by  comparing  the  trade-union 
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Year. 

Im¬ 
ports.* 
£  min. 

Logarithms. 

Ex¬ 
ports  t 
£  min. 

Logarithms. 

Year. 

Im¬ 
ports.  * 
£  min. 

Logarithms. 

Ex¬ 
ports.! 
£  min 

Logarithms. 

1800 

28 

1.447 

34 

I-53I 

1854 

152 

2.182 

Il6 

2.064 

1801 

31 

1.491 

•  •  • 

1855 

144 

2.158 

II 7 

2.068 

1802 

29 

1.462 

•  •  • 

•  •  • 

1856 

173 

2.237 

139 

2.143 

1803 

26 

I-4I5 

•  •  • 

•  •  • 

1857 

179 

2.252 

146 

2.164 

1804 

27 

i-43i 

•  •  • 

•  •  • 

1858 

165 

2.216 

140 

2.146 

1805 

28 

1-447 

38 

1.580 

1859 

179 

2.252 

156 

2.193 

1806 

27 

I-43I 

41 

1.613 

i860 

211 

2.324 

165 

2.217 

1807 

27 

1. 43i 

37 

1.568 

1861 

217 

2.336 

160 

2.204 

1808 

27 

1.431 

37 

1.568 

1862 

226 

2-354 

166 

2.220 

1809 

32 

1-505 

47 

1.672 

1863 

249 

2.396 

197 

2.295 

1810 

39 

I-59I 

48 

1. 681 

1864 

275 

2-439 

213 

2.328 

1811 

27 

I-43I 

33 

i-5l8 

1865 

271 

2.432 

219 

2.340 

1812 

26 

i-4i5 

42 

1.623 

1866 

295 

2.470 

239 

2-379 

1813 

•  .  « 

•  • 

•  •  • 

•  •  • 

1867 

275 

2-439 

226 

2.354 

1814 

34 

I-53I 

45 

1.653 

1868 

295 

2.470 

228 

2.358 

1815 

32 

1.505 

52 

1.716 

1869 

295 

2.470 

237 

2-375 

1816 

37 

1.568 

42 

1.623 

1870 

3°3 

2.481 

244 

2.387 

1817 

3i 

1.491 

42 

1.623 

1871 

331 

2-5T9 

284 

2.453 

1818 

37 

1.568 

46 

1.663 

1872 

355 

2-550 

315 

2.498 

1819 

3i 

I-49I 

35 

i-544 

1873 

37i 

2.569 

3il 

2.492 

1820 

32 

1-505 

36 

i-556 

1874 

370 

2.568 

298 

2.475 

1821 

3i 

I-49I 

37 

1.568 

1875 

374 

2-573 

282 

2.450 

1822 

3i 

1.491 

37 

1.568 

1876 

375 

2.574 

257 

2.410 

1823 

36 

1-556 

35 

1-544 

1877 

394 

2.596 

252 

2.401 

1824 

37 

1.568 

38 

1.580 

1878 

369 

2.567 

245 

2.389 

1825 

44 

1.643 

39 

I-59I 

1879 

363 

2-559 

249 

2.396 

1826 

38 

1.580 

32 

1-505 

1880 

411 

2.614 

286 

2.456 

1827 

45 

1-653 

37 

1.568 

1881 

397 

2-599 

297 

2.473 

1828 

45 

1.653 

37 

1.568 

1882 

4i3 

2.616 

307 

2.487 

1829 

44 

1.643 

36 

i-556 

1883 

427 

2.630 

305 

2.484 

1830 

46 

1.663 

38 

1.580 

1884 

390 

2.591 

296 

2.471 

1831 

50 

1.699 

37 

1.568 

1885 

37i 

2.569 

271 

2.432 

1832 

45 

1-653 

36 

1.556 

1886 

350 

2-544 

269 

2.429 

1833 

46 

1.663 

40 

i.6oa 

1887 

362 

2.558 

281 

2.448 

1834 

49 

1.690 

42 

1.623 

1888 

388 

2.586 

299 

2.476 

1835 

49 

1.690 

47 

1.672 

1889 

428 

2.631 

3x6 

2.500 

1836 

57 

1-756 

53 

1.724 

1890 

421 

2.624 

328 

2.516 

1837 

55 

1.740 

42 

1.623 

1891 

435 

2.638 

309 

2.490 

1838 

61 

1.785 

50 

1.699 

1892 

424 

2.627 

292 

2.465 

1839 

62 

1.792 

53 

1.724 

1893 

405 

2.607 

2  77 

2.442 

1840 

67 

1.825 

51 

1.708 

1894 

401 

2.603 

274 

2-439 

1841 

63 

1.799 

52 

1.716 

1895 

4i7 

2.620 

286 

2.456 

1842 

64 

1.806 

47 

1.672 

1896 

442 

2.645 

296 

2.471 

1843 

68 

1.832 

52 

1.716 

1897 

45i 

2.654 

294 

2.468 

1844 

74 

1.869 

59 

i.77i 

1898 

47i 

2.673 

294 

2.468 

184S 

83 

1.919 

60 

1.778 

1899+ 

485 

2.685 

330 

2-5T9 

1846 

73 

1.863 

58 

1.763 

1900 

523 

2.719 

354 

2-549 

1847 

83 

1. 919 

59 

1.771 

1901 

522 

2.718 

348 

2.542 

1848 

89 

1.949 

53 

1.724 

1902 

528 

2.723 

349 

2-543 

1849 

100 

2.000 

64 

1.806 

1903 

543 

2-735 

360 

2.556 

1850 

100 

2.000 

7 1 

1.851 

1904 

55i 

2.741 

37i 

2.569 

1851 

hi 

2.045 

74 

1.869 

1905 

565 

2.752 

408 

2.611 

1852 

109 

2.037 

78 

1.892 

1906 

608 

2.784 

461 

2.664 

1853 

123 

2.090 

99 

1.996 

*  Imports — Official  values  till  1853 ;  real  values  from  1854. 
f  Including  re-exports. 

I  Value  of  ships  included  from  1 899. 
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percentage  of  unemployed  with  the  marriage  rate.  In  Fig.  I, 
the  numbers  are  shown  on  natural  scales ;  in  Fig.  2  the  averages 
over  twenty-nine  years  are  equated  and  the  numbers  are 
shown  on  a  logarithmic  scale.  We  might  proceed  as  on 

Equation  of  p.  158,  but  to  use  an  alternative  method,  the 

fluctuations,  maxima  and  minima  in  various  periods  are  written 
down  as  in  the  table  on  p.  175,  and  the  averages  of  the  fluc¬ 
tuations  from  maximum  to  minimum  (expressed  as  percentages 
of  the  maximum)  are  calculated.  It  is  found  that  a  fluctua¬ 
tion  of  8*4  per  cent,  in  the  number  employed,  in  those  trade 
unions  whose  returns  are  accessible,*  corresponds  to  one  of 
9-7  per  cent,  on  the  marriage  rate.  To  investigate  a  possibly 
closer  correspondence,  assume  that  a  portion  of  the  number 
employed  do  not  influence  the  marriage  rate,  and  find  what 
part  must  be  subtracted  before  this  8-4  per  cent,  of  the  total 
forms  as  much  as  9-7  per  cent,  of  the  remainder;  the  average 
percentage  of  members  of  the  trade  unions  at  work  in  the 
selected  period  was  95*1 ;  8-4  per  cent,  of  this  is  7-99,  which 
forms  9-7  per  cent,  of  82*4.  Thus  12*7,  the  difference  between 
95-1  and  82*4,  may  be  considered  as  not  influencing  the  ques¬ 
tion,  and  subtracted  throughout  before  logarithms  are  taken. 
This  process  would  be  replaced  on  the  natural  scale  by  equating 
the  averages  of  two  series,  and  drawing  one  base  line  so  far 
below  the  other  that  average  fluctuations  would  be  represented 
by  the  same  vertical  distance  for  both  series ;  which  process  is 
exactly  equivalent  to  that  adopted  on  p.  158.  Expressed 
algebraically,  we  are  now  investigating  the  equation — 

log  (y  —  c)  —  log  x=  k,  a,  constant, 

where  c  and  k  are  constants  to  be  so  selected  as  to  give 
the  closest  fit,  and  y  and  x  are  the  quantities  to  be 
compared. 

In  the  adjacent  diagrams,  Fig.  1  gives  the  figures  in  the 
natural  scale ;  Fig.  2  gives  them  on  the  logarithmic  scale,  after 
they  have  been  arranged  so  as  to  make  average  percentage 
fluctuations  equal ;  while  in  Fig.  3  the  shorter  period,  1880-96, 
is  treated  in  a  method  precisely  similar  to  that  of  Fig.  2.  The 
actual  numbers  and  logarithms  are  given  on  the  next  page. 

*  The  figures  in  columns  2  and  4  in  the  second  table  on  the  next  page 
are  taken  from  Mr.  G.  H.  Wood's  paper  on  Some  Statistics  of  Working  Class 
Progress  since  i860,  Statistical  Journal,  1900,  where  a  valuable  logarithmic 
diagram  will  be  found,  illustrating  many  of  the  points  of  this  section. 


MARRIAGE  RATE  AND  EMPLOYMENT. 
Fig.  i.  Comparison  in  1865-93. 

On  Natural  Scale. 
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Fig.  2.  The  same  ;  Logarithmic  Scale. 


Fig.  3.  Comparison  in  1880-1896. 


Years  I860  1830  1895 

» Same  scale  as  figure  Z. 
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Marriage  Rate  per  i,ooo. 

Percentage  Employed. 

Years. 

Maxima. 

Minima. 

Differ¬ 

ences. 

%of 

Max. 

Years. 

Maxima. 

Minima. 

Differ¬ 

ences. 

%  Of 
Max. 

1869 

1373 

1879 

1882-83 

1886 

1891 

1893 

•  •  • 

17.6 

15-5 

15.6 

•  •  • 

15-9} 

14.4J 

14.2/ 

14.7} 

i-7 

3-2 

1. 1 

i-3 

1.4 

•9 

IO 

18 

7 

8 

9 

6 

1868 

1872 

1879 

1882 

1886 

1889-90 

1893 

•  •  • 

98.9 

•  •  • 

98.I 

97-9 

•  •  • 

9i-5 

87.5 

9°-5< 

92.5. 

_ 

7-4 

II. 4 

10.6 

7.6 

7-4 

5-4 

7-5 

11. 5 

10.8 

7.8 

7.6 

55 

9-7 

8.4 

Average  percentage  employed,  1865-93,  95.1  ;  8.4  per  cent,  of  95.1  is  9.7  per 

cent,  of  82.4. 


Years. 

Marriage 

Rate. 

Logarithms. 

Percentage 

Employed. 

Less  13.7. 

Logarithms. 

1865 

17-5 

I.243 

98.O 

85.3 

I-93I 

1866 

17.5 

I.243 

96.9 

84. 1 

I.925 

1867 

16.5 

1. 217 

92.7 

80.0 

I.903 

1868 

16.1 

1.207 

91-5 

78.8 

I.896 

1869 

15  9 

1. 201 

92.6 

79-9 

1.902 

1870 

16. 1 

1.207 

95-7 

83.0 

1. 919 

1871 

16.7 

1.223 

98.2 

85-5 

1.932 

1872 

17.4 

I  24O 

98.9 

86.2 

1-935 

1873 

17.6 

I.245 

98.7 

86.0 

1.934 

1874 

17.0 

I.23O 

9S.2 

85-5 

1.932 

1875 

16.7 

1.223 

97-5 

84. 8 

1.928 

1876 

16.5 

1. 217 

96.4 

83-7 

1-923 

1877 

15-7 

I.I96 

95-6 

82.9 

1. 919 

1878 

15.2 

I.l82 

93-7 

81.0 

1.908 

1879 

14.4 

1.158 

87.5 

74.8 

1.874 

1880 

14.9 

I- 173 

94.1 

81.4 

1.911 

l88l 

15- 1 

1. 179 

96.5 

83.8 

1.923 

1882 

15.5 

1. 190 

9S.  1 

85-4 

I-93I 

1883 

15.5 

1. 190 

97.8 

85.1 

1.930 

1884 

i5-i 

1. 179 

92.6 

79  9 

1.902 

1885 

14.5 

I.l6l 

91.0 

78.3 

1.894 

1886 

14.2 

MS2 

90-45 

77-7 

1.890 

1887 

14.4 

I.I58 

92.6 

79-9 

1.902 

1888 

14.4 

1.158 

95-2 

82.5 

1.916 

1889 

15.0 

1.176 

97-9 

85.2 

1-930 

1890 

IS-S 

1. 190 

97-9 

85.2 

1.930 

1891 

15.6 

1. 193 

96.5 

83.8 

1.923 

1892 

15.4 

I.187 

93-7 

81.0 

1.908 

1893 

14.7 

I.167 

Average 

I.I96 

92.5 

79.8 

1.902 

Average 

I.916 

Logarithms  of  Numbers  i  to  i,oo<>,  Correct  to  the  Nearest  Digit  in  the  Third  Decimal  Place. 
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A  critical  account  of  logarithmic”curves,  strongly  advocating 
their  use,  is  given  by  Professor  Irving  Fisher,  in  the  Quarterly 
Publications  of  the  American  Statistical  Association ,  June  1917. 
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CHAPTER  VIII. 
ACCURACY . 


Introductory. 

There  is  not  in  existence  a  perfectly  accurate  measure¬ 
ment,  physical  or  economical,  just  as  there  is  no  perfectly 
The  nature  of  straight  line  or  perfect  fluid.  We  can  best  illus- 
measurement.  traqe  the  nature  of  economic  measurements  by 

considering  that  of  physical.  It  is  easy  to  weigh  substances 
accurately  to  i  gram  :  then  by  obtaining  a  good  balance,  we 
can,  as  our  apparatus  is  improved,  weigh  accurately  to  a 
centigram,  milligram,  and  one-tenth  of  a  milligram;  but  for 
accuracy  beyond  this  the  balance  fails  us.  Similarly  in  measur¬ 
ing  angles,  the  naked  eye  can  distinguish  an  object  which 
subtends  one-thirtieth  of  a  degree ;  with  a  sextant  a  measure¬ 
ment  can  be  taken  correctly  to  fifteen  seconds  of  arc;  the 
Greenwich  astronomers  can  make  observations  correct  to  one- 
hundredth  part  of  a  second,  but  we  again  come  to  a  point 
beyond  which  precision  is  unattainable. 

In  such  cases  the  result  is  stated  as  correct  to  a  milligram, 
or  whatever  it  may  be ;  in  the  same  way  we  speak  of  an  esti¬ 
mated  sum  of  money  correct  to  a  pound. 

A  task  which  has  considerable  resemblance  to  some  statis¬ 
tical  estimates,  is  the  measurement  of  the  parallax  of  the  sun, 
physical  and  which  determines  its  distance  from  the  earth. 

statistical  During  the  eighteenth  century  astronomers  esti- 

measurements.  .  •,  .  .  „  .  ,  . , 

mated  it  as  io  ,  equivalent  to  96,000,000  miles. 
As  methods  of  observation  and  instruments  were  improved, 
observers  began  to  agree  that  the  whole  number  of  seconds  was 
8,  but  gave  various  estimates  for  the  first  decimal  figure.  Since 
1865  there  have  been  very  few  estimates  which  have  not  given 
8  as  the  nearest  figure  for  this  place  (8*8"),  while  more  recent 
observations  agree  in  making  the  parallax  from  876"  to  878". 
We  may,  therefore,  consider  that  the  distance  is  now  accurately 
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known  to  within  1  in  400.  Notice  in  this  connection,  first,  that 
the  earlier  observations  have  been  subject  to  corrections; 
secondly,  that  better  agreement  has  been  attained  as  time  has 
gone  on;  thirdly,  that  neither  absolute  agreement  nor  ab¬ 
solute  accuracy  have  yet  been  obtained.  So  it  is  with  statistical 
measurements ;  we  might  instance  the  gradual  settlement  of 
the  curve  representing  expectation  of  life,  the  measurement  of 
the  fall  in  prices,  and  the  development  of  wage  statistics. 

Again  in  physical  measurements,  though  we  can  sometimes 
reach  a  very  high  degree  of  accuracy,  as,  for  instance,  in  the 
weight  of  a  cubic  foot  of  water  which  could  doubt-  Degrees  of  pos- 
less  be  known  correctly  to  one  part  in  a  million,  in  sible  accuracy- 
other  cases  we  are  glad  if  we  can  measure  to  one  part  in  ten,  as, 
for  instance,  in  the  distance  of  the  nearest  fixed  star  from  us, 
which  is,  roughly,  from  34  to  37  billion  miles.  So  in  statistics 
it  is  something  if  we  know  that  the  total  capital  of  the  United 
Kingdom  was  between  7J  and  10  thousand  million  pounds  in 
1885,  or  if  we  know  that  the  average  weekly  wage  of  working¬ 
men  in  full  work  was  from  21s.  to  27s.  in  1886.  The  weak  point 
in  such  statements  is  that  often  when  we  have  made  an  esti¬ 
mate,  which  we  know  to  be  inexact,  we  are  not  able  to  give  any 
estimate  of  the  limits  of  the  error.  We  are  not  so  definite  as 
The  Modern  Traveller  who 

“  .  .  .  knew  the  weather  to  a  T, 

The  longitude  to  a  degree, 

The  latitude  exactly.” 

We  are  not  able  to  say  “  our  estimate  is  24s.  $d. ;  this  is  prob¬ 
ably  correct  within  3 d.,  and  it  is  not  possible  that  we  are  as 
much  as  6 d.  wrong  ” ;  whereas  in  physical  measurements  we 
can  often  give  the  result  as  correct  to  the  smallest  graduation 
of  the  instrument  employed. 

On  the  other  hand,  though  we  cannot  obtain  exactness,  we 
can  in  many  cases  estimate  to  that  degree  of  accuracy  which  is 
required  for  practical  purpose.  In  common  use  The  accuracy 
only  a  certain  conventional  accuracy  is  needed.  eenerally needed- 
Thus,  to  take  some  miscellaneous  instances,  the  area  of  an  estate 
is  given  in  acres,  roods,  and  poles,  but  not  correct  to  square 
yards;  the  market  prices  of  shares  do  not  change  less  than 
XY ;  we  keep  the  day,  not  the  hour,  of  our  birth ;  railway 
time-tables  do  not  show  seconds ;  ocean  steamers  are  timed  to 
start  at  certain  hours,  not  minutes ;  height  is  measured  correct 

N  2* 


i8o 


ELEMENTS  OF  STATISTICS 


to  one-tenth  of  an  inch ;  a  hundred  yards  race  is  timed  to  one- 
tenth  of  a  second.  Similarly  in  statistical  estimates,  we  seldom 
need  that  our  results  shall  be  accurate  within  one  per  thousand, 
or  even  i  per  cent.  One  per  thousand  of  the  working  week  is 
less  than  three  minutes ;  I  per  cent,  of  the  week’s  wage  is  only 
6 d.  We  do  not  care  to  know  the  population  of  London  within 
ioo,  the  expenditure  of  the  Exchequer  within  £1000,  or  the 
expectation  of  life  within  a  day.  It  is  often  possible  to  attain 
practical  accuracy  within  such  limits. 

Definition  of  Error. — For  purposes  of  measurement  we 
may  take  the  following  definition  : — The  relative  error  in  an 
estimate  is  the  ratio  of  the  difference  between  the  estimate  and  the 
true  value ,  to  the  estimate  ;  the  error  is  to  be  reckoned  positive 
when  the  true  value  exceeds  the  estimate. 

Thus  if  the  average  weekly  wage  of  agricultural  labourers 
was  in  reality  14s.,  and  we  estimated  it  as  13s.,  our  error  would 

be  — — —  =  — ,  or  7*7  per  cent. ;  if  we  had  estimated  it  as 

13  13 

15s.,  the  error  would  be  — - —  — - — ,  or  —  6-6  per  cent. 

o  I5  15’  r 


In  algebraic  notation,  if  u  be  the  measurement  of  a  quantity  whose 


true  value  is  u1,  then 


iv 


u 


shall  call  e ;  so  that  e  — 


u 

u1  —  u 


is  the  error  in  the  estimate,  which  we 


u 


,  and  u1  =  u  (1  +  e)*  e  thus  defined 


is  the  relative  error,  while  ue  is  the  absolute  error. 


In  the  nature  of  things,  when  we  are  dealing  with  errors, 
we  do  not  know  their  magnitude ;  the  most  we  can  know 
statement  of  is  their  probable  and  possible  extent.  We 
errors.  might  estimate,  for  instance,  the  percentage  of 
unemployed  in  a  certain  year  as  4*5,  and  add,  from  informa¬ 
tion  in  our  possession  (coming  from  a  study  of  wage-bills 
or  the  reports  of  relief  agencies),  that  we  considered  this  to 
be  within  *5  of  the  fact ;  we  should  then  write  the  number 
4*5  ±  *5>  meaning  that  the  error  in  the  estimate  as  defined  above 

was  unlikely  to  be  more  than  or  11  per  cent.,  the  corre¬ 

sponding  absolute  error  being  -5.  In  such  a  case  we  can  also 

*  It  is  sometimes  more  convenient  to  write  u  —  u1  (1  +  e),  reckoning  the 
error  relatively  to  the  true  value.  Then  e  =  —  e  -f-  e1  approximately,  and 
when  e  is  less  than  10  per  cent,  we  may  take  e  ==  —  e. 
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give  definite  limits.  The  percentage  unemployed  must  lie 
between  o  and  ioo ;  and  if  we  could  actually  enumerate  i  per 
cent,  of  the  working-class  as  out  of  work,  and  also  92  per 
cent,  as  in  work,  we  should  know  that  the  number  required 
was  between  i-o  and  8-o  per  cent.,  and  the  maximum  error  in 

i-j 

our  estimate,  4*5,  was  —  =  -,  or  78  per  cent.  Even  this  is 

4  5  9 

more  precise  than  the  original  statement,  “  the  percentage  is 
4*5,  error  unknown.”  By  further  investigation  we  might 
perhaps  bring  the  limits  of  error  nearer  to  each  other,  and 
decide  that  it  was  practically  certain  that  the  percentage 
required  was  between  3-5  and  4-5 ;  then  we  ought  to  say  “  the 
number  unemployed  is  -04  ...  of  the  working-class,  the 
estimate  being  correct  to  the  last  figure  given.”  This  statement 
is  of  the  same  nature  as,  “  The  body  weighs  15  lbs.  3  oz.,  correct 
to  an  ounce.” 

While,  on  the  one  hand,  it  is  clear  that  we  cannot  often 
obtain  close  definite  limits  to  our  errors,  on  the  other  we  can 
very  often  see  that  some  of  the  digits  in  a  total  are  almost 
certainly  right  and  others  almost  certainly  wrong.  Thus  when 
we  see  in  the  Registrar-General's  Report  that  the  population  of 
the  United  Kingdom  in  1895  was  39,124,496,  the  estimate  being 
made  from  the  census  of  1891,  and  the  increase  calculated  on 
the  basis  of  the  increase  since  1881,  we  may  be  certain  that 
the  last  two,  or  the  last  three,  digits  are  no  better  than  guess¬ 
work;  while  the  first  two,  or  the  first  three,  are  correct.  Thus 
the  statement  should  read  :  Population  was  39-1  millions,  or 
39,124,000  +  5000,  or  whatever  figures  our  examination  of  the 
varying  rate  of  progress  of  the  population  led  us  to  adopt,  and 
such  a  statement  is  actually  more  correct  than  the  previous 
one. 

It  is  the  custom  in  many  classes  of  estimates  to  give  the 
figures  to  the  uttermost  farthing.  This  is  possibly  right  in 
official  publications;  for  the  duty  of  the  office  Neglect 
is  to  receive  and  tabulate  returns,  stating  how 
and  whence  they  came,  and  it  may  leave  to  the  economist  or 
the  statistician  the  task  of  deciding  the  degree  of  accuracy 
pertaining  to  them.  But  in  summary  descriptions  and 
accounts,  and  in  scientific  estimates,  it  is  not  merely  unneces¬ 
sary  to  give  these  last  figures  (both  because  they  are  not 
accurately  known,  and  because  they  generally  have  no  impor- 
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tance  to  the  argument  or  significance  to  the  reader),  but  it  is 
positively  inaccurate.  The  easiest  way  to  avoid  the  inaccuracy 
is  simply  to  state  totals  in  so  many  thousands  (e.g.,  the  earth 
is  8000  miles  in  diameter),  or  if  for  any  reason  more  exact 
measure  be  required  (as  when  we  are  comparing  the  equatorial 
diameter  with  the  smaller  one  through  the  poles),  the  scientific 
way  is  to  give  the  number  as  far  as  it  has  been  fairly  calculated, 
and  to  indicate  its  precision. 


Rules  for  Computing  the  Effect  of  Relative  Errors. 

We  may  now  give  some  rules  connecting  the  errors  of  a 
complex  estimate  with  those  of  the  elements  which  form  it. 

I.  The  error  in  an  estimated  sum  is  equal  to  the  sum  of  the 
errors  in  the  parts  when  each  is  multiplied  by  the  ratio  of  the 
corresponding  part  to  the  sum. 

For  if  we  estimate  n  quantities  as  u1}  u2  .  .  .  un,  and  their  sum 

as  u,  so  that  u—u1-\-u2  .  .  .  un,  and  the  errors  of  the 

Error  in  sum. 

quantities  are  elt  e2  .  .  .  em  and  that  of  the  sum  is  e  : 
then  the  true  value  of  the  sum  is  u  (1+0),  and  the  true  values  of  the 
parts  are  ux  (1+^,),  u2  (i+£2)  .  .  .  ,  so  that — - 

U  (l  +  *)=«l  (IT^l)+W2  (I  +  ^2)T  “K 

but  u=u1  -\-u2  +  + ; 

hence,  by  subtraction,  ue=u1  ex  -\-u2  e2  -j-  +, 

1  W 1  ,  Wo  t  1 

and  e~=— ex X —  ~  -f-  d-* 

u  11 


The  formula  is  easily  adapted  to  the  case  where  some  of  the 
parts  are  subtractive. 

To  take  an  arithmetical  example,  if  average  working-class 
expenditure  on  food,  clothes  and  rent  was  estimated  in  1914  as 
25s.,  5s.  6 d.,  and  6s.  6 d.  respectively,  while  the  true  averages 


were  27s.,  4s.  6d.,  and  6s.,  so  that  the  errors  are  -f- 
and-j~  then  the  error  in  the  sum  of  the  three  is — 


25’ 


2 

II 


+  2-  of 
25  37 


■—  of  ~ 

11 


L0f^5 
37 


37  '  13  ”  37  +  '°54  H — 027_+"  ’0135 

=  -f-  *0135  or  -f  ij  per  cent. 

We  can  apply  the. rule  to  the  important  case  where  we 
can  estimate  a  great  part  of  a  required  total  with  considerable 
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accuracy,  while  we  are  ignorant  of  a  smaller  part.  Thus  we 
may  receive  returns  from  several  unions  that  33,650  are  out 
of  work,  and  have  reason  to  know  that  the  error  is  not  more 
than  1  per  cent.,  while  some  smaller  unions  do  not  send  any 
returns ;  we  make  an  estimate  for  the  smaller  unions,  say  that 
1000  of  their  members  are  unemployed,  and  suppose  a  very 
large  error,  say  §  or  67  per  cent.  Then  the  error  in  the  total  is 
less  than — 


1 

100 


33650  ,  2  ,  1000 


an  error  very  much  nearer  that  of  the  larger  returns  than  that 
of  the  smaller.  In  the  preceding  sentence  we  say  “  less  than,” 
because  we  assume  that  we  have  taken  an  outside  limit  for  the 
smaller  errors. 


II.  The  error  in  the  arithmetic  average  of  several  estimates  is 
the  sum  of  the  errors  of  these  estimates,  when  each  is  multiplied  by 
the  ratio  of  the  corresponding  estimate  to  that  of  the  sum  of  the 
estimates. 


For  if  mi,  m2,  .  .  .  mn  are  n  estimates  of  quantities  whose  true 
values  are  mx  (i+^i),  m2  (i+^2),  .  .  .  ,  the  estimated  and  Error  in 

true  averages  are  respectively — -  average. 

mx-\-m2-\-  .  .  .  mn  mx  (i-f-gQ-j -m2  (i-f-g2)~l~  -  •  •  -\~nin  (i-j-gw) 
n  n 

and  the  error  in  the  average  is — 

mx  (i-f-gi )-fm2  (i-f g2)-f  +  _  mx-\-M2Ac  + 
n 


mx-\-m2^\-  -f- 


n 


m  1  .  m2 

=  ex  X  o —  +  e2  X  o - b 

S .  m  S .  m 

where  S  denotes  the  sum  of  all  the  m’s. 


n 


m< 


exmx-\-e2m2-\ — \- 
mx-\-m2-{-  + 


It  is  easily  seen  that  no  individual  error  can  have  much 
influence  on  the  result,  that  the  error  in  the  average  would  be 
nearly  of  the  same  magnitude  as  one  of  the  individual  errors,  if 
these  were  not  very  unequal  and  airpositive  or  all  negative,  and 
that  if,  as  is  generally  the  case,  some  are  positive  and  some 
negative  (a  point  we  shall  consider  presently),  the  error  would 
be  considerably  lessened. 
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III.  The  error  in  a  weighted  average  is  the  sum  of  (1)  an  error 
due  to  errors  in  the  quantities ,  similar  to  the  error  of  an  un¬ 
weighted  average ,  and  (2)  an  error  due  to  errors  in  the  weights , 
which  becomes  very  small  when  the  original  quantities  are  nearly 
equal. 


Let  Wi,  W2  .  .  .  Wn  be  estimated  weights  applied  to  n  estimated 
Error  in  quantities  Mlf  M2  .  .  .  Mn,  and  let  the  true  values  of 
weighted  the  weights  be  Wx  (i+q),  W2  (i+f2)  •  •  •  and  of  the 
average.  quantities  be  Mx  (1+^1),  M2  (i+02),  .  .  . 

Write  so  that  Mw  is  the  estimated  weighted  average, 

and  let  Mw  (i+E)  be  its  true  value. 


Then 


AT  ^_SW  (1+e)  M  (i+e)  SWM 
SW(I+£)  SW 


=  [SW.S{W tMt  (i+«)  (i+ft)}-SWM.S{Wi  (i+«)}]4-SW.SW  (i+e), 
where  the  sufhx  t  denotes  any  selected  quantity,  etc. 

Then — 

E.SWM.SW  (i+e)=SW.SW<M^+SW.SW*Mf  («+*«) -SWM.SWiei. 

Now  suppose  E,  et,  u  to  be  as  small  as  *i,  and  neglect  pro¬ 
ducts  which  are  as  small  as  -oi. 

E .  SWM  .  SW-SW .  SWtM*i+S{Wi  (Mi .  SW-SWM)  et} 

.  F  SW Mtet  S{W  1  (Mi.  SW— SWM)  et) 

*•  SWM  +  SWM .  SW 


The  term  involving  et,  the  error  in  a  quantity,  is  the  same  as 
that  in  Rule  II.,  if  WiMj  is  written  for  mlt  etc. 

The  coefficient  of  et  needs  further  analysis. 

Since  SWM=Mw .  SW,  MiSW-SWM=SW .  (Mi-Mw)=whSW, 
where  mxt  is  the  excess  of  a  quantity  over  the  weighted  average. 

.  t?  _  qWjMr  ^ W m  h 

•  •  SWM '  e‘+SWMf'' 

Hence  the  resulting  error  due  to  the  errors  in  quantities  involves 
the  magnitudes  Mlf  M2)  etc.,  while  that  due  to  the  errors  in  weights 
involves  only  the  deviations  of  these  quantities  from  their  weighted 
average.  These  deviations  are  individually  small  if  the  dispersion 
of  the  quantities  about  their  mean  is  small  relatively  to  that  mean. 
Further,  the  sum  of  the  coefficients  W*wh=SW*Mi— MwSW=o ; 
if  the  errors  in  weights  are  all  equal  the  resulting  error  in  the 
average  is  zero,  as  is  evident  a  priori,  and  if  positive  errors  are  not 
generally  found  with  positive  deviations  (mh)  and  negative  with 
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negative,  and  if  large  errors  are  not  generally  found  with  large 
weights  (and  vice  versa),  the  sum  of  the  terms  W tm1^  tends  to 
be  small. 

Hence  the  errors  in  weights  have  an  effect  which  not  only 
diminishes  from  the  same  causes  as  affect  the  errors  in  quantities, 
but  also  have  coefficients  which  have  a  strong  tendency  to  neutralise 
one  another,  unless  the  magnitudes  of  the  errors,  quantities  and 
weights  are  associated  with  each  other  in  special  ways.  Great 
errors  are  required  in  the  weights,  if  many  quantities  are  involved, 
to  make  an  appreciable  error  in  the  average.  In  fact,  the  errors 
in  quantities  have  so  much  more  influence  than  those  in  weights,  when 
once  the  weights  have  been  reasonably  estimated,  if  the  quantities  are 
not  very  unequal,  that  errors  in  the  weights  can  very  frequently  be 
neglected.  Several  numerical  examples  of  this  principle  were  given 
in  the  section  on  weighted  averages. 

IV.  The  error  in  a  product  is  approximately  the  sum  of  the 
errors  in  its  factors ,  due  regard  being  paid  to  sign. 

For  if  /i,  f2,  .  .  .  fn  are  the  estimated  factors,  whose  true  values 
are/j  (i+^i),/2  (i+£2)>  •  •  •  ,  then  the  error  of  the  product 

_  fl  t1  H~^l)  -fi  (I-M2)  •  •  •  •  — fl-fz-  •  •  .  product. 

fi  */a  •  •  •  • 

=  (i+g)  •  (1+^2)  •  •  •  —  i=e1-Jre2Jr  +£„,  if  we  neglect  products  of 
two  or  more  e’s. 

The  e’s  are  equally  likely,  a  priori ,  to  be  positive  or  negative. 
If  two  e’s  are  of  different  signs,  they  tend  to  neutralise  one 
another.  The  error  in  a  product  may  be  great  if  all  the  errors 
of  the  factors  are  of  the  same  sign,  even  if  they  are  small 
individually. 

For  example,  if  we  estimate  that  100  men  are  earning  on  the 
average  255.  each,  while  in  reality  there  are  105  men  earning 
26s.,  the  error  in  the  estimated  total  sum  earned  is,  by  formula, 

5  1  1 

- - - =  -OQ. 

IOO  25  v 

If,  with  the  same  estimates,  the  real  quantities  had  been 
105  and  24s.,  the  error  in  the  product  would  have  been 


V.  The  error  in  a  ratio  is  approximately  the  difference 
between  the  errors  in  its  two  terms ,  due  regard  being  had  to  sign , 
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For  if  Ui,  u2  be  the  estimated  terms,  whose  true  values  are 
Error  in  ratio.  ui  (x+*i)  and  w2  (i+*a),  then  the  error  in  the  ratio  is— 
Ui  (i+fi)  __  «i 

it  2  (i-F^a)  j  ei  e2 

Ui  1+^2  i-\re  2 

u  2 

—  (^1  ^2)  (l  £'2~l~f'22  ^23+  ) 

=  01—02,  if  we  neglect  terms  of  the 

second  order  in  the  e’s. 


If  the  errors  in  the  terms  are  both  positive  or  both  negative , 
they  tend  to  neutralise  one  another  ;  if  they  are  also  nearly  equal, 
the  error  in  the  ratio  becomes  very  small. 


We  can  apply  Rule  V.  to  the  error  in  comparison  of  two 
averages  of  similar  quantities  estimated  at  different  dates. 

With  the  same  notation  as  under  Rules  II.  and  III.,  using 
tn,  e,  e,  for  the  quantities  at  one  date,  and  m1,  e1,  c1,  for  similar 
quantities  at  another  date,  then  the  error  in  the  ratio  of  the  simple 
average  of  mxx,  m2 1  ...  to  the  simple  average  of  mlt  m2  .  .  .  is — 


Now  if  the  quantities  have  not  changed  much  during  the  period 


MV 


m 


between  two  observations,  the  fraction  will  differ  little  from  g  , 
and  so  on. 

Neglecting  these  differences  in  comparison  with  the  quantities 
themselves,  a  legitimate  process  when  we  are  estimating  the 
approximate  influence  of  errors,  we  have — 

(yyi  ^  i 

—  0l)\ 


If  the  two  estimates  have  been  made  under  nearly  similar  circum¬ 
stances,  leading  to  similar  chances  of  errors,  exx  and  ex  are  likely  to  be 
not  only  of  the  same  sign,  but  nearly  equal. 

Write  dlf  d2  .  .  .  for  (e-f— 01),  {e2X— 0^  .  .  .  ,  and  we  have — 


Error  = 


where  the  d’s  may  be  small. 


The  corresponding  analysis  for  the  error  in  the  ratio  of  two 
weighted  averages  is  too  complicated  to  be  given  here ;  *  but 


*  It  will  be  found  in  the  Statistical  Jourfi$lx  iqii,  pp.  85  se<p,  and,  in.  a 
piodified  form  in  Part  II,  Appendix,  Note  7. 
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using  the  principle  that  errors  in  weight  are  less  important  than 
errors  in  quantity,  which  applies  with  slight  modifications,  we 
may  use  the  formula  just  given  for  the  first  approximation  to 
the  error  in  the  ratio  of  two  weighted  averages.  This  formula 
may  be  put  in  words  : — 

VI.  The  error  in  the  ratio  of  two  averages  of  similar  series 
of  quantities ,  estimated  at  different  dates ,  is  approximately  equal 
to  the  sum  of  the  differences  between  the  errors  in  the  corre¬ 
sponding  terms  of  the  two  series,  each  multiplied  by  the  ratio  of 
the  latter  of  these  corresponding  terms  to  the  sum  of  all  the  terms 
at  the  latter  date. 

This  rule  is  so  important  that  it  will  be  worth  while  to 
illustrate  it  by  an  example,  in  which  a  further  Error  in 
quantity  will  be  introduced.  averages. 

If  in  each  of  two  years  we  are  able  to  estimate,  as  in  our  example 
under  Rule  I.,  one  part  of  a  total  more  accurately  than  another  part, 
we  can  use  the  following  formulae  : — 

First  Year.  Second  Year. 

Estimated  numbers  or  weights  w ;  error  e ;  w1 ;  error  e1 
Estimated  average  income,  or 

quantity  -  ;  error  ex ;  mp ;  error  ep 

Estimated  number,  less  accu¬ 
rately  known  -  -  -  rw;  error  in  r,  p ;  rV ;  error  in  r1,  pl 

Estimated  income  -  -  m2 ;  error  e2 ;  m21 ;  error  ezl 

ex  and  ep  are,  by  hypothesis,  less  than  e2  and  e2x. 

Error  in  average  for  first  year — 

w  (i+e) .  wij  (i-f  ef)-\-r  (i+p)  •  w  (i+«)  •  (1+^2)  _  wmi+rwm  a 

W  (l-j-e)+r  (1  -f-p)  W  (1  +  4  W-\-YW 

w-\-rw 

wi,  .  rm2  ,  r  m2—m1 

—e\  .  T  e2  — - b  p  — ■  • - j - 

m1Jrrni2  1 m1-\-rm2 

if  we  neglect  products  of  e  and  p. 

Here  the  errors,  e2  and  p,  connected  with  the  less  accurately 
known  part,  are  each  multiplied  by  r,  the  ratio  of  the  weight  of 
that  part  to  the  weight  of  the  better  known  part,  p  is  multiplied 
by  m2— mlt  which  in  many  cases  is  small,  while  ex,  the  remaining 
error,  is  by  hypothesis  small. 

If  for  simplicity  of  argument  we  assume  that  the  ratio  of  the 
unknown  part  tP  tjie  whole  (but  not  the  error  in  estimating  it) 
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has  remained  unchanged,  and  also  that  the  ratio  of  the  estimated 
average  incomes  of  the  two  parts  has  not  altered,  we  have  for  the 
error  in  comparison — 


Oh1— <h) 


m1 

m-L+rmi 


+ 


Oh1— e2) 


rm2 

m1-\-rm2 


+  (p1 — p) 


r  m2—m1 
i -\-r  '  m1-\-rm2 


Thus  in  estimating  the  change  in  average  wages  of  Scotch 
agricultural  labourers,  we  have  figures  similar  in  character  to 
the  following  : — 


1867.  Married  Ploughmen. 

Estimated  number  -  1,000  Average  income,  ^36 
Supposed  true  number  1,010  „  „  35 


1892.  Corresponding 
Numbers. 

1,200  ^49  o  o 
1,220  48  o  o 


Farm-Servants. 

Estimated  number  -  200  Average  income —  240 


Money  - 

£21 

^27  5  0 

Estimated  value 

of  board 

13 

w 

4^ 

0 

0 

Total 

Supposed  true  number  220  Total  income 


;£34 

£37 


£41  5  o 

240  £47  o  o 


Here  w= 1,000,  mx= 36,  r=},  m2=3 4,  1,200,  m11= 49,  rx=\, 

m21  =  41 J,  €— e±  =  — yV>  P~  TFT>  e2  =  jT>  cl  =  ^V>  ei— 

ni_ _ i_  P  1 — _2 3 

P  61>  ^2  165* 


Here  it  is  supposed  that  we  have  overvalued  the  income  of 
the  married  ploughmen,  and  undervalued  that  of  the  farm- 
servants  in  both  cases.  We  suppose,  as  is  the  fact,  that  the 
value  of  the  board  and  other  perquisites  of  the  farm-servants 
cannot  be  estimated  with  precision,  and  that  the  proportionate 
numbers  in  the  two  classes  are  not  accurately  known. 

Substituting  in  the  above  formula  we  find  that  the  error  in 
the  estimated  ratio  of  the  average  incomes  of  the  two  classes 
together  in  the  two  years  is — 


+  *0062,  due  to  errors  in  estimates  of  income  of  ploughmen. 

+  *0081,  ,,  „  ,,  servants. 

■f-  -0008,  ,,  ,,  ratios  of  the  numbers  in  the 

two  classes. 


Thus  the  last  error,  due  to  weights,  is  very  small,  and  the 
second  error,  due  to  ignorance  of  the  value  of  board,  is  reduced 
by  the  smallness  of  the  number  employed  to  a  magnitude 
comparable  with  the  first. 

The  whole  error  is,  therefore,  by  formula  +  -oijg,  Going 
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to  the  actual  figures,  we  find  the  estimated  ratio  of  the  second  to 
the  first  to  be  1*3376  to  1,  and  the  supposed  true  ratio  to  be 

C>i53 


i*3529  to  1 ;  that  is,  the  error  is 


=  +  -on. 


i*3376 

The  difference  between  the  two  methods  of  calculation  is 
accounted  for  by  the  neglect  of  the  less  important  terms. 

It  is  to  be  noticed  that  the  error  in  the  ratio  of  two  quanti¬ 
ties  is  not  the  same  as  the  error  which  we  might  be  inclined  to 
estimate,  the  error  in  the  percentage  increase.  Thus  in  the 
case  just  taken,  the  estimated  and  true  percentage  increases  are 
33*8  and  35*3,  and  the  relative  error  in  the  percentage  increase 
is  *045.  For  accuracy  in  such  calculations,  then,  we  require 
the  error  found  by  formula,  according  to  Rule  VI.,  to  be  very 
small. 


Another  example  is  found  from  the  well-known  difficulty  of 
estimating  the  relative  importance  of  expenditure  on  clothing  in  a 
workman’s  family  budget. 

The  following  estimates  were  used  in  the  Report  on  the  Cost 
of  Living,  1918  (Cd.  8980,  pp.  7,  18  and  23). 


Skilled  Workmen,  Average  Weekly  Expenditure. 


1914. 

1918. 

Ratio. 

Food 

- 

27s. 

49s.  10^. 

1-84 

Clothing  - 

- 

7s. 

13s.  9 d. 

1-96 

Together  - 

- 

345. 

63s.  7 d. 

1*864 

Here  we  take  w=2j,  r=-/T,  w1=i-84,  w2=i*96. 

Suppose  that  r  ought  to  have  been  taken  as  J,  and  mlt  m2  as 
1-90,  2*10. 

Then  e1=-^==  *0326,  e2=-*T=  *0714,  and  =  *286. 

The  resulting  error  by  formula  is — 

-j-  -0256,  due  to  error  in  the  ratio  of  food  expenditure  at  the  two  dates. 
+  -0155,  y  y  yy  „  clothing  „  * 

-j-  -0030,  ,,  ,,  ratios  of  the  expenditures  on  clothing 

and  food. 

And  the  whole  relative  error  is  -044. 

The  effects  of  the  errors  are  in  the  reverse  order  of  their 
magnitude,  and  the  great  error  in  the  clothing  ratio  barely  affects 
the  second  decimal  place  in  the  result. 

If,  however,  m2—mx  had  been  larger,  that  is  if  the  estimated 
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increase  in  expenditure  on  clothing  had  been  much  greater  than 
that  on  food,  the  effect  of  this  error  would  have  been  proportionately 
more. 

We  return  to  the  whole  question  of  relative  errors  as  illuminated 
by  the  theory  of  probability  in  Part  II,  Chapter  IV,  below. 

Biassed  and  Unbiassed  Errors. 

In  the  consideration  of  all  errors  in  averaging  or  comparing, 
it  is  important  to  distinguish  two  classes  of  errors,  those  which 

_  are  biassed  and  those  which  are  unbiassed.  The 

Errors  are 

biassed  or  un-  difference  can  be  made  clear  by  illustrations.  If  a 
number  of  men  are  sent  to  investigate  the  condi¬ 
tion  of  an  industry  in  different  places,  with  a  view  of  proving 
that  wages  are  high,  conditions  of  work  healthy,  and  so  on,  they 
would  probably,  by  examining  only  the  best  conducted  works, 
and  taking  the  wages  only  of  the  more  skilled  and  regular  work¬ 
men,  produce  an  average  for  each  town  which  would  be  too 
high.  On  the  other  hand,  if  there  was  no  brief  to  be  held,  but 
the  investigation  was  impartial,  the  commissioners  would  in 
some  towns  take  too  high  an  average,  in  others  too  low, 
according  to  their  idiosyncrasies  and  to  circumstances.  In  the 
first  case,  the  errors  would  be  biassed,  all  in  the  same  direction, 
all  tending  to  increase  the  average,  whose  error  would  be 
equal  to  the  average  error  in  the  different  towns.  In  the 
second  case,  the  errors  would  be  unbiassed,  just  as  likely  to 
be  in  excess  or  defect,  and  the  more  estimates  made,  the  smaller 
would  the  resulting  error  be.  The  following  figures  would 
illustrate  this  : — 


Fact. 

Biassed 

Estimate. 

Unbiassed 

Estimate. 

• 

s. 

s. 

s. 

Average  Wages  in  District  — a 

24 

25 

24 

t> 

>>  »)  L 

23 

25 

25 

n  >>  c 

26 

27 

25 

>>  >>  & 

27 

28 

28 

j>  >»  ^ 

28 

30 

27 

Averages  - 

25.6 

27 

25.8 

Errors . 

•  •  • 

5-2  % 

1  % 
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In  measuring  the  distance  of  a  bicycle  ride  on  a  mile-stoned 


road,  it  is  found  that  the  distances  between  successive  mile¬ 
stones  are  not  exact,  but  perhaps  50  to  100  yards  out ;  but 
it  may  be  nearly  as  likely  that  the  errors  will  be  in  excess  or 
defect,  and  the  greater  the  distance  gone  the  smaller  will  be 
the  error,  as  defined.  The  errors  are  unbiassed.  If,  on  the 
other  hand,  the  bicyclist  trusts  to  his  cyclometer,  he  will 
have  to  deal  with  a  biassed  error,  for  the  instrument  will  not 
fit  the  wheel  exactly,  but  will  always  register  say  1800  yards 
when  the  machine  has  gone  a  mile.  This  is  a  case  where  the 
bias  can  be  measured  and  allowed  for,  whereas  the  unbiassed 
errors  must  be  left  to  eliminate  themselves.  It  is  frequently 
the  case  that  biassed  errors  are  due  to  a  wrongly  graduated 
instrument ;  unbiassed  to  separate  faulty  measurements. 

In  the  census  returns,  the  fact  that  many  women  return 
themselves  as  younger  than  their  birth  certificate  states,  causes 
a  biassed  error  in  the  average  age  of  the  population ;  the  fact 
that  people  frequently  return  their  ages  at  the  nearest  round 
number  causes  unbiassed  error,  and  on  the  whole  affects  the 
average  little.  It  is  not  improbable  that  in  the  Wage  Census  of 
1906,  there  was  some  tendency  to  obtain  returns  from  the 
more  liberally  conducted  establishments  in  some  industries ; 
this  causes  a  biassed  error  in  the  average  obtained.  With  these 
illustrations  we  can  pass  on  to  another  principle 


of  great  importance.  Unbiassed  errors  are  of  little  tance  of  biassed 
importance  compared  with  biassed  errors  in  a  simple  andunbiassed 

r  r  r  errors. 

estimate;  but  biassed  errors  diminish  when  the  ratio 
of  two  similar  estimates  is  taken. 

For  in  an  average  of  several  quantities,  which  have  biassed 
errors  ( rjlf  yj 2  .  .  .)  and  unbiassed  errors  (elf  e2  .  .  .),  it  is  easy 
to  see  from  Rule  II.  that  the  resulting  error  may  be  written 


In  the  first  term,  the  errors  being  unbiassed,  many  of  them  are 
positive,  many  of  them  negative,  and  they  tend  to  neutralise  one 
another ;  in  fact,  if  E  is  typical  of  the  errors  C2  •  .  .  6n ,  then  a  first 
approximation  to  the  error  arising  from  them  in  the  average  is 


*  It  is  as  likely  as  not  that  so  great  an  error  would  be  obtained.  See 
Part  II,  Chap.  IV. 
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Thus  in  the  average  of  one  hundred  measurements,  whose 

individual  unbiassed  errors  are  about  the  resulting  error 

may  be  no  greater  than  -  x - f-  +/100  =  .  There  is  no 

counterbalancing  tendency,  on  the  other  hand,  in  the  biassed 
errors;  if  each  estimate  was  10  per  cent,  in  excess,  then  the 
average  is  also  10  per  cent,  in  excess.  When  aiming  at 
Great  effect  of  accuracy  our  principle  always  is  to  take  care  of 
biassed  errors,  ^he  p0unds^  and  iet  the  pence  take  care  of  them¬ 
selves  ;  and  it  is  quite  futile  to  diminish  the  unbiassed  errors, 
that  is  to  increase  the  precision  of  our  measurements,  while  a 
large  biassed  error  runs  through  them  all.  If  we  do  not  know 
of  the  existence  of  biassed  errors,  which  in  reality  pervade 
our  estimates,  there  is  no  remedy ;  if  we  do  know  of  them,  we 
are  likely  to  obtain  more  accuracy  by  the  most  erroneous  cor¬ 
rections  for  them  than  by  neglecting  them ;  for  when  we  make 
unbiassed  corrections  for  our  biassed  errors,  we  reduce  them  to 
unbiassed  errors,  and  then  the  more  terms  we  include  in  our 
average  the  smaller  is  our  resulting  error.  If,  for  instance, 
we  find  that  the  average  weekly  wage  of  agricultural  labourers 
throughout  the  country  is  13s.,  and  by  considering  the  circum¬ 
stances  of  the  thousand  returns  which  we  may  suppose  led 
to  this  average  we  have  reason  to  suppose  that  an  error  of 
is.  would  be  typical  of  the  unbiassed  errors  in  them,  then 

an  error  of  -  of  7IS*  ,  that  is  only  a  farthing,  may  be  expected 
3  v  1000 

to  result  in  the  average.  We  have  here  a  totally  illusive 
accuracy;  the  part  of  the  labourer’s  income  which  we  have 
not  included,  payments  at  haytime  and  harvest,  facilities  for 
piece-work,  cheap  rent  for  cottage  and  land  and  smaller 
perquisites,  is  not  capable  of  exact  calculation.  If  we  omit 
all  these  entirely  we  shall  leave  an  error  in  our  average  of  2 s. 
or  so ;  but  we  make  individual  estimates  of  these  additions, 
in  all  the  thousand  cases,  though  each  estimate  may  be  2s. 
wrong,  if  there  is  no  bias,  the  resulting  error  in  the  average 

may  be  expected  to  be  -  of  —  _ ,  that  is  only  -d.  :  our 

3  viooo  2 

whole  error  now  may  be  less  than  id.,  instead  of  2 s.  In 
estimating  the  accuracy  of  published  averages,  these  principles 
should  be  always  borne  in  mind,  and  the  possibility  of  biassed 
errors  always  considered. 
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When  we  are  dealing  with  the  errors  of  a  ratio  the  case  is 
quite  different.  The  error  of  a  ratio  is  approximately  equal  to 
the  difference  between  the  errors  in  its  terms ;  if  Accuracy  of 
r],  't'j1  and  c,  e 1  are  the  biassed  and  unbiassed  errors  comparisons, 
in  the  terms,  then  by  Rule  V.  (rj1  —  rj)  +  (&1  —  e)  is  the  error 
in  the  ratio.  Now  the  unbiassed  error  (e1  —  e)  is  likely  to  be 
of  nearly  the  same  magnitude  as  either  e  or  e1  ;  *  if,  as  in  the 


2 

above  example,  e  and  e 1  are  unlikely  to  be  much  greater  than 

3 


o 

( e 1  —  e)  would  be  unlikely  to  be  much  greater  than  -.  But 

(rj1  —  rj),  the  result  of  the  biassed  errors,  will,  if  the  bias  in  both 
terms  of  the  ratio  was  in  the  same  sense  (positive  in  both,  or 
negative  in  both),  be  less  than  the  original  errors.  If  we  have 
made  the  estimates  of  both  terms  on  precisely  similar  methods, 
if  we  have  asked  the  same  questions  of  the  same  classes  of 
persons,  included  and  omitted  the  same  details  on  both  occa¬ 
sions,  we  shall  have  made  nearly  the  same  errors  of  bias  in  both 
estimates.  To  return  to  our  previous  illustration,  if  we  have 
made  the  glaring  mistake  of  omitting  everything  except 
average  weekly  wages  in  the  income  of  an  agricultural  labourer 
on  both  occasions,  the  only  resulting  error  in  the  ratio  will  be 
that  due  to  the  change  in  the  proportion  that  these  extra 
payments  bear  to  ordinary  wages,  which  in  short  periods  is 
likely  to  be  small.  Or,  if  we  had  taken  summer  wages  as  the 
average  for  the  year  in  both  cases,  the  error  in  the  ratio  will 
depend  only  on  the  change  in  the  relation  of  summer  wages  to 
that  average.  Hence  the  error  in  the  ratio  of  two  estimates 
at  different  dates  of  a  slowly  changing  quantity  is,  if  the 
estimates  are  made  on  similar  methods,  often  much  smaller 
than  the  error  in  either  estimate  singly ;  for  the  unbiassed  error 
is  little  greater,  and  the  more  important  biassed  error  is  much 
diminished.  We  need  not  now  know  of  the  existence  of  the 
biassed  errors ;  they  will  disappear  of  themselves.  If  we  are 
aware  that  there  are  biassed  errors,  and  have  any  means  of 
making  fairly  good  estimates  of  them,  it  will  be  worth  doing ; 
but  we  shall  make  a  great  mistake  if  we  correct  the  bias  in 
one  year  and  leave  it  uncorrected  in  another.  For  purposes  of 
comparison  it  is  very  seldom  of  much  use  and  often  of  great 


*  If  E  is  the  probable  error  in  e  or  e1,  then  E 
in  their  difference;  see  Part  II,  Chap.  III. 


V 2  is  the  probable  error 


O’ 
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disutility  to  make  the  later  estimate  more  accurate  than  the 
..  . ,  ..  earlier.  The  error  resulting  from  unbiassed  errors 

mity  in  structure  can  indeed  be  diminished  a  little,*  but  the  error 
of  serial  returns.  resupqng  from  the  more  important  biassed  errors 

will  only  be  increased.  All  Government  officials  and  others 
who  compile  annual  returns  are  in  a  dilemma  :  to  make  their 
annual  statements  accurate  in  themselves,  they  should  always 
be  straining  after  improvements,  they  should  always  be  watch¬ 
ing  for  changes  in  the  quantities  measured  and  adapting  their 
methods  and  tabulations  to  these  changes ;  but  to  make  their 
annual  returns  comparable  with  each  other,  they  should  be 
absolutely  conservative,  and  cling  to  any  mistakes  they  or  their 
predecessors  have  made  in  the  past  with  all  the  strength  red 
tape  can  give  them,  being  careful,  however,  not  to  add  to  the 
mistakes  or  make  new  omissions.  The  dilemma  can  in  some 
cases  be  avoided ;  for  when  an  improved  method  is  introduced, 
the  tabulation  can  sometimes  be  given  for  a  few  years  both  on 
the  old  and  on  the  new  plans ;  then  when  the  difference  intro¬ 
duced  by  the  change  is  known,  the  earlier  figures  can  be 
brought  to  the  greater  precision  of  the  later.  Thus  the  Board 
of  Trade  since  1898  has  included  in  the  tabulation  of  exports 
ships  which,  leaving  our  shores  with  merchandise,  are  them¬ 
selves  sold  to  a  foreign  owner;  and  we  have  the  following 
tabulation  : — 


1S99. 

1898. 

Exports  of  Home  Products 
(exclusive  of  ships  sold  to 
foreigners)  - 

Re  -  exports  of  Home  and 
Colonial  Merchandise 

Total  - 

Value  of  New  Ships  exported 

New  total 

^255,465,000 

65,020,000 

;£233,359>00° 

60,655,000 

^320, 485,000 
9,195,000 

^294,014,000 
Not  stated. 

^329,680,000 

*  For  if  E  and  Ex  be  typical  of  the  unbiassed  errors  at  the  two  dates, 
then  VEP  +  E2  is  typical  of  the  error  in  the  ratio,  which  diminishes  with 
either  E  or  Ex.  See  Part  II,  Chap.  IV,  formula  (66). 
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Ignorance  of  slight  alterations  in  the  collection  and  tabula¬ 
tion  of  material  has  been  the  cause  of  many  statistical  mistakes 

To  sum  up  the  chief  results  of  this  chapter  :  there  are  two 
processes  which  tend  to  accuracy — averaging ,  which  diminishes 
unbiassed  errors;  and  comparison,  which  dimin-  Rcsults 
ishes  biassed  error.  The  errors  in  weights  are 
seldom  so  important  as  the  other  errors  which  are  present  in 
estimates.  Errors  in  a  result  cannot,  of  course,  be  calculated, 
but  can  be  expressed  in  terms  of  errors  in  the  items,  from 
which  it  comes;  we  cannot  attain  certainty,  but  we  can 
indicate  processes  which  diminish  errors,  and  with  the  help  of 
mathematics  measure  the  extent  of  diminution.  Initial  errors 
are  diminished  most,  when  we  calculate  the  ratios  of  weighted 
averages  of  similar  and  similarly  estimated  quantities.  Index- 
numbers,  which  we  discuss  in  the  next  chapter,  are  examples 
of  this  class. 

The  accuracy  resulting  from  the  process  of  sampling 
requires  more  mathematical  treatment,  and  is  dealt  with  in 
Part  II,  Chapters  II,  and  IV. 


CHAPTER  IX. 
INDEX-NUMBERS. 


The  discussion  of  index-numbers  supplies  so  good  an  illus¬ 
tration  of  the  principles  laid  down  in  the  last  chapter,  and 
index-numbers  are  so  important  in  themselves,  that,  though  it  is 
our  intention  to  avoid  special  questions,  it  will  be  worth  while 
to  devote  a  chapter  to  them. 

Index-numbers  are  used  to  measure  the  change  in  some 
quantity  which  we  cannot  observe  directly,  which  we  know  to 
Function  of  have  a  definite  influence  on  many  other  quantities 
index-numbers.  which  we  can  so  observe,  tending  to  increase  all, 

or  diminish  all,  while  this  influence  is  concealed  by  the  action 
of  many  causes  affecting  the  separate  quantities  in  various  ways. 
Thus,  to  take  three  of  the  quantities  to  which  index-numbers 
are  applied,  the  change  in  the  relation  of  the  precious  metals 
to  the  work  to  be  done  by  them  affects  prices  of  all  com¬ 
modities,  but  very  many  other  causes  are  at  work  affecting  the 
prices  of  separate  groups  of  commodities;  there  are  general 
causes  tending  to  raise  the  wage  of  a  week’s  work  of  average 
skill,  but  this  general  increase  is  concealed  by  numberless  minor 
causes  affecting  different  grades  of  labour  in  different  degrees ; 
the  change  in  the  consumption  of  goods  by  the  working  or  other 
classes  is  a  sufficiently  definite  quantity,  but  it  can  only  be 
measured  indirectly  by  observing  the  varying  changes  in  the 
consumption  of  individual  articles. 

The  use  of  index-numbers  is  not,  however,  confined  to  these 
instances,  but  is  nearly  co-extensive  with  the  field  of  statistics ; 
for  we  have  limited  the  term  statistics  to  the  measurement  of 
complex  groups  and  their  changes ;  the  object  of  statistics  is  to 
measure  the  action  of  the  general  laws  which  govern  a  hetero¬ 
geneous  group,  and  the  changes  produced  by  general  forces  can 
be  measured,  as  a  rule,  only  by  their  effect  in  individual  cases ; 
thus  the  method  of  index-numbers  is  at  once  applicable  to  the 
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disentanglement  of  that  which  is  common  to  the  whole  group 
from  those  variations  which  are  special  to  individual  items. 

In  the  more  restricted  sense  a  series  of  index-numbers  is  a 
series  of  weighted  averages,  calculated  periodically,  where  the 
quantities  averaged  are  similar  (prices  or  wages),  Nature  of 
and  the  weights  are  defined  so  as  to  give  the  index-numbers- 
actual  average  of  the  whole  group  concerned  in  each  measure¬ 
ment.  In  its  less  restricted  sense  a  series  of  index-numbers  is  a 
series  which  reflects  in  its  trend  and  fluctuations  the  movements  of 
some  quantity  to  which  it  is  related.  Where  the  weights  and  the 
quantities  are  both  known  exactly,  the  method  of  index-numbers 
is  merely  a  convenient  way  of  expressing  straightforward 
arithmetical  results  in  a  simple  manner ;  this  simplicity  can  be 
nearly  realised  in  index-numbers  of  prices  of  exports.  Where 
the  quantities  are  samples  selected  from  a  wide  group,  and  there 
is  no  obvious  method  of  deciding  their  relative  importance,  the 
index-numbers  have  a  less  direct  relation  to  the  movement  of  a 
definable  and  measurable  phenomenon;  such  is  the  nature  of 
most  price  index-numbers  and  of  some  wage  index-numbers. 
Where  the  quantities  are  not  direct  measurements  of  examples 
of  the  phenomena  which  it  is  desired  to  study,  but  of  allied 
phenomena,  then  the  connection  between  the  series  of  index- 
numbers  and  the  phenomena  is  indirect ;  such,  in  fact,  are  most 
of  the  index-numbers  of  wages  and  of  employment.* 

The  most  ordinary  way  of  forming  an  index-number  is  inter¬ 
mediate  between  the  extremes  of  exactness  and  of  indirect 
relation.  Thus  in  the  Labour  Department’s  index-number  of 
the  change  of  rates  of  wages,  the  objective  is  presumably  to  find 
numbers,  year  by  year,  whose  ratios  are  the  same  as  the  ratios 
of  the  average  rates  of  weekly  wages  of  persons  in  regular 
industrial  work  in  the  United  Kingdom ;  at  least,  the  numbers 
are  generally  quoted  in  this  sense,  and  the  heading  in  the 
Abstract  of  Labour  Statistics  is  “  General  Course  of  Wages  in 
the  United  Kingdom  ”  (e.g.,  XVIth  Abstract,  Cd.  7131,  p.  8 2). 
This  index-number  is  obtained  by  selecting  some  hundreds  of 
recognised  time  or  piece  rates,  expressing  each  as  a  percentage 
of  its  amount  in  1900,  and  averaging  the  results  year  by  year. 
The  choice  of  weights  in  this  average  is  indirect ;  each  of  the 
five  groups  (building,  coal-mining,  engineering,  textiles,  agricul- 

*  Parts  of  these  pages  are  taken  from  the  Statistical  Journal,  1912,  pp. 
791-5- 
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tural)  is  taken  as  of  the  same  importance,  while  the  building 
group  contains  74  items,  agriculture  115,  etc.  Mr.  Sauerbeck 
obtains  his  index-number  of  prices  by  selecting  the  prices  of 
typical  commodities,  and  weights  them  by  the  device  of  dupli¬ 
cating  quotations  for  the  more  important.  Thus  in  these  and 
other  cases  we  have  a  selection  of  "  quantities,”  whether 
deliberately  or  by  accident,  and  an  assignment  of  “  weights,” 
whether  directly  or  indirectly.  It  is  then  hoped  that  the 
numbers  will  move  in  direct  proportion  to  the  phenomenon, 
average  wage  or  average  level  of  prices,  whose  measurement  is 
attempted. 

In  such  cases  three  points  call  for  consideration — (1)  The 
nature  and  extent  of  the  group  and  the  nature  of  its  special 
property  whose  general  change  is  studied.  (2)  The  method  of 
choosing  samples.  (3)  The  effect  of  weights.  (1)  With  Mr. 
Sauerbeck’s  numbers  the  group  is  Prices  of  wholesale  com¬ 
modities  in  the  United  Kingdom ;  and  with  other  index-numbers 
the  groups  are  the  prices  of  goods  exported,  of  goods  imported, 
and  so  on.  In  the  Labour  Department ’s  wage  index  the  group  is 
two-fold  and  consists  of  (a)  rates  of  weekly  time  wages,  (b)  piece 
rates,  and  the  result  is  hybrid.  It  is  essential  to  define  both  the 
extent  of  the  group  and  the  property  or  attribute  which  is  to  be 
measured.  The  property  is  sometimes  elusive,  as  the  “  purchas¬ 
ing  power  of  money  ”  or  ”  the  amount  of  unemployment,”  and  in 
such  cases  we  have  to  define  and  measure  an  allied  attribute, 
such  as  the  level  of  prices  or  the  number  unemployed  according 
to  some  chosen  definition  of  unemployment.  (2)  In  choosing 
samples,  the  rule  generally  followed  is  to  take  only  those  where 
the  definition  is  adequate  and  the  measurement  accurate,  and 
in  the  best  known  index-numbers  the  choice  is  then  so  limited 
that  all  quotations  which  satisfy  the  rule  are  included.  It  very 
often  happens  that  in  this  way  the  definition  of  the  group  must 
be  reconsidered  and  limited.  Thus  if  we  start  out  to  measure 
prices  in  general,  the  necessities  of  definition  generally  limit  us 
to  wholesale  prices  of  goods  which  have  regular  market  quota¬ 
tions  ;  and  in  wages  the  Labour  Department  is  limited  to  cases 
where  wages  or  rates  are  agreed  on  or  standardised  (except 
in  the  case  of  agriculture).  In  order  that  the  resulting  index- 
number  should  be  subject  to  the  analysis  of  the  law  of  error  the 
samples  should  be  random  and  independent  in  their  fluctuations 
from  the  general  movement ;  dependence  increases  the  number 
of  samples  necessary  for  an  assigned  precision.  Randomness 
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may,  perhaps,  be  secured  by  the  accidents  which  make  the 
samples  eligible ;  this  is  probably  the  case  with  wholesale  prices 
but  not  with  wages.  Where  the  selection  is  biassed,  we  may 
sometimes  obtain  safety  by  further  restriction  of  the  definition 
of  the  group.  (3)  If  the  number  of  independent  quantities  is 
at  all  considerable,  any  reasonable  system  of  weights  is  likely 
to  give  as  good  a  result  as  the  conditions  of  the  problem 
allow. 

Suppose  that  the  changes  in  a  group  of  quantities  are  deter¬ 
mined  by  one  general  force  which  acts  on  all  in  the  same  sense, 
that  is,  tends  to  increase  all  or  decrease  all,  and  by  several  other 
forces  each  of  which  acts  on  one  or  more  of  the  quantities,  and 
some  of  which  tend  to  increase,  others  to  decrease  the  quantities 
they  affect ;  then  of  the  special  forces,  some  will  tend  to  increase, 
others  to  diminish  the  average,  while  the  general  force  will 
have  a  cumulative  effect  entirely  towards  increasing,  or  entirely 
towards  diminishing  it.  If  the  separate  effects  of  the  special 
forces  are  small  compared  with  their  number,  they  will  tend  to 
neutralise  one  another  in  their  influence  on  the  average ;  and 
the  change  in  the  average  will  show  the  influence  of  the  general 
cause  only.  In  the  language  of  the  last  chapter,  the  special 
forces  produce  unbiassed  changes,  which  are  negligible  in  their 
effect  on  an  average,  in  comparison  with  the  biassed  changes 
produced  by  the  general  force. 

It  appears  from  consideration  of  many  of  the  index-numbers 
in  ordinary  use  that  the  quantities  actually  measured  are  not 
those  whose  general  movement  we  wish  to  know.  Wholesale 
prices  do  not  move  with  retail  prices  in  accordance  with  any 
simple  law,  either  of  constant  difference  or  constant  ratio ; 
standard  wages  differ  in  an  unknown  way  from  average  wages ; 
piece-rates  have  a  varying  and  unknown  relation  to  earnings. 
We  do  not  get  any  such  simple  relations  between  the  quantity 
that  can  be  measured  and  the  property  that  is  really  in  question 
as  y  =  or  y  —  kx,  or  y  =  a  +  bx ;  but  rather  y  —  f(x),  where 
the  form  of  the  function  is  unknown.  In  order  that  the  index- 
number  may  be  intelligible,  y  —  a  +  bx  must  be  a  good  appioxi- 
mation  over  the  ordinary  range  of  x — for  extreme  values  of  # 
terms  of  higher  powers  may  become  important  and  the  resulting 
index  untrustworthy.  Here  a  disappears  in  the  process  of 
forming  an  index-number.  It  is  often  difficult  to  determine  bt 
which  measures  the  ratio  of  a  change  in  y  to  that  of  a  change 
in  x. 
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The  resulting  index-number  for  any  assigned  year  may  be 
defined  and  expressed  thus  :  Let  xv  x2  .  .  .  xn  be  n  quantities 
whose  general  movement  is  to  be  studied.  Let  yv  y2  .  .  .  yr  •  •  • 
yn  be  measured  quantities  related  to  the  former  by  equations 
of  a  form  to  which  yr  —  ioo  =  br(xr  —  ioo)  is  a  good  approxi¬ 
mation.*  Let  suitable  weights  wv  w2  .  .  .  be  assigned  and 

write  J  for  +  •  •  •,  and  I  for  y*  +  W'Z'  +  — . 


Wi  -I-  w2  + 


W1  +  W2  + 


Then  J  is  the  incalculable  theoretic  index-number  whose  changes 
express  the  movement,  and  I  is  the  index-number  calculated 
and  used. 

t  (y  —  100)  __  Zwb  (x  —  100) 

Zw  Zw 

Let  b±  —  k  +  dv  b2  =  k  +  d2  .  .  .  .  ,  where  k  is  chosen,  if 
possible,  as  that  average  of  the  b's  which  makes 

Zwd  (x 


Zw 


(  =  F,  say) 


small  for  the  ordinary  range  of  values  of  the  x's. 

Then 

T  ,  Zw  (x  —  100)  .  Zwd  (x  -  100)  ,  /T  ^ 

I-ioo  =  k - ^ -  —  &  (J  —  100)  +  F. 


If  the  x's  have,  in  general  only,  a  moderate  range  of  values,  if 
the  b’s  are  nearly  equal  and  extreme  values  of  b  do  not  coincide 
with  extreme  values  of  w,  then  F  is  small  and  its  variation  from 
year  to  year  negligible. 

In  this  case  I  is  so  related  to  J  that  it  equals  100  in  the 
standard  year  when  J  (and  every  x  and  y)  is  100,  and  a  change  in 
its  value  is  very  nearly  k  times  the  change  in  J,  where  k  is  an 
average  of  the  b's  which  measure  the  ratio  of  the  changes  of 
the  various  y’s  to  those  of  the  corresponding  a’s. 

If  we  try  to  make  a  retail  price  index-number  out  of  whole¬ 
sale  prices,  the  b's  are  not  known,  and  presumably  differ  greatly 
from  one  commodity  to  another,  from  time  to  time,  and  vary  in 
an  unknown  way  when  prices  are  specially  high  or  low.  Hence 
the  connection  between  general  retail  prices  and  wholesale  prices 
is  not  so  close  as  to  allow  the  statement  that  a  change  in  the 
one  is  directly  proportional  to  a  change  in  the  other.  In  the 
case  of  the  Labour  Department’s  index-number  of  wages,  the 
changes  in  time-rates  have  not  the  same  relation  to  earnings 
as  have  those  in  piece-rates,  and  in  neither  group  is  the  relation 

*  This  equation  can  readily  be  obtained  by  a  rearrangement  from 
y  =  a  +  bx. 
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known ;  that  is,  the  b’s  are  unknown  and  are  not  equal.  So 
far  as  piece-rates  are  concerned,  when  these  rates  are  rising  it 
often  happens  that  earnings  rise  more  rapidly  ( b  less  than  i), 
and  when  they  fall  that  the  earnings  fall  less  rapidly  (b  greater 
than  i) ;  that  is,  the  b’s  are  not  constant  and  F,  in  the  formula 
just  given,  is  unknown  and  not  negligible. 

If  the  b’s  are  equal,  F  is  zero,  and  k  could  be  determined  by 
special  examinations  in  two  years.  Then  the  movements  of  I 
would  reflect  faithfully  the  movements  of  J  on  a  known  scale. 

The  actual  relations  are  not  known ;  if  an  x  is  4  per  cent, 
above  the  average,  we  do  not  assume  that  y  is  also  4  per  cent, 
above  its  average,  but  assume  that  its  deviation  is  4  per  cent. 
X  b,  where  b  is  nearly  constant.  The  b’s  differ  from  some 
(weighted)  mean  value  (k),  and  it  is  assumed  that  the  effect  of 
these  differences  nearly  disappears  when  the  average  is  taken, 
and  that  the  mean  value,  k,  is  nearly  constant  from  year  to 
year.  Various  hypotheses  can  be  made  as  to  the  values  of 
the  b’s  and  the  resulting  value  of  k,  and  the  fluctuations  of  the 
index-numbers  interpreted. 

It  is  essential  that  when  an  x  returns  to  a  value  after  a 
fluctuation,  the  corresponding  y  shall  return  to  its  former  value, 
or  at  least  that  any  differences  shall  be  small  and  unbiassed. 
This  condition  would  be  broken  if  wholesale  prices  were  used  to 
measure  the  changes  in  retail  prices,  while  the  relation  between 
the  two  gradually  changed,  as  presumably  it  does.  It  is  broken 
in  the  Labour  Department’s  index  of  the  general  course  of 
wages,  in  so  far  as  changes  in  standard  wages  or  piece-rates 
have  a  varying  relation  to  changes  in  average  wages. 

There  are  many  index-numbers  of  wholesale  prices  extant, 
some  of  which  we  may  pass  in  review.  The  Board  of  Trade 
publish  the  recorded  quantity  and  value  of  goods  The  Board  of 
imported  and  exported,  and  the  average  prices  of  Trade  index, 
these  goods  can  be  calculated.  Those  commodities  are  selected 
which  occur  in  the  returns  for  the  whole  period  chosen.  A 
particular  year  is  chosen  as  base ;  then  the  goods  are  valued 
in  all  other  years  separately  at  their  prices  in  the  base  year; 
the  total  of  these  values  in  any  year  is  the  sum  which  the 
goods  would  have  been  worth  if  their  prices  had  remained 
unchanged;  the  ratio  of  this  value  to  that  actually  recorded 
is  the  ratio  of  their  average  price  in  the  base  year  to  their 
average  price  in  the  other  year  selected  (if  the  term  average 
is  used  broadly),  and  if  the  first  term  of  this  ratio  is  equated 
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to  ioo,  the  second  term  is  the  index-number  required  for  the 
year  selected,  expressed  as  a  percentage  of  the  number  for  the 
base  year.  It  is  at  once  evident  that  we  are  here  dealing  with 
weighted  averages. 

Let  px,  p2)  pz  ...  be  the  prices  in  the  base  year  of  units 
of  the  goods  selected,  and  r2p2)  rsp3  .  .  .  the  prices  in 
systems  the  year  for  which  we  require  an  index-number  : 
of  weights,  then  r3  %  %  measure  the  changes  of  prices 

for  the  separate  commodities,  and  these  r’s  are  the  samples 
from  which  we  are  to  deduce  the  general  change  of  price. 
The  weights  used  in  the  process  described  may  be  found 
thus  :  let  bv  b2>  b3  ...  be  the  numbers  of  units  of  goods  in 
the  selected  year;  then  the  total  value  in  the  selected  year 
at  the  prices  of  that  year  is  (b^-^pi  -f-  b2r2p2  +...),  and  at  the 
prices  of  the  base  year  is  {b1p1  +  b2p2  +...);  the  ratio  is 
Zbrp  :  Zbp,  and  the  index-number  for  the  selected  year  is 

100  x  sy = io° x  Kr  ■  zip)- 

Here  the  weights  applied  to  the  r’s  are  the  values  which  the 
corresponding  goods  in  the  selected  year  would  have  borne  at 
the  prices  of  the  base  year.  It  is  clear  that  the  selection  of  the 
standard  year  affects  the  weights,  for  any  particular  commodity 
can  be  given  special  weight  by  choosing  as  base  a  year  in  which 
its  price  is  high,  and  much  trouble  has  been  spent  in  searching 
for  a  “  normal  ”  year ;  but  though  the  weights  of  separate  com¬ 
modities  are  affected,  it  does  not  follow  that  the  average  will 
be  altered,  and  we  should  expect  from  the  principle  laid  down 
above  that  the  change  would  be  very  slight.  In  fact  we  have 
the  following  fig  ures  : — 


INDEX  NUMBERS  OF 

1886  AND  1883  COMPARED.* 

Imports. 

Exports. 

Weights.  | 

Values  at 

1873 

Prices. 

Values  at 
1883 
Prices. 

Values  at 
1861 
Prices. 

Values  at 
1881 
Prices. 

Values  at 
1873 
Prices. 

Values  at 
1883 
Prices. 

Values  at 
1861 
Prices. 

Values 
at  1881 
Prices. 

1883 

IOO 

IOO 

IOO 

IOO 

IOO 

IOO 

IOO 

IOO 

1886 

81.7 

82.1 

82.9 

82.3 

88^ 

88 

87 

89 

*  From  the  Economic  Journal  and  the  Statistical  Journal ,  both  June 
1897. 
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It  is  possible  to  produce  figures  which  show  a  variation 
caused  by  a  change  of  base  year,  but  it  is  done  by  choosing 
samples  which  lend  themselves  to  the  special  argument. 

Since  so  great  an  alteration  in  choice  of  weights  makes  so 
little  difference,  it  is  worth  while  to  see  if  we  need  even  keep 
the  weight  due  to  the  quantities  imported  (the  b’s  in  the  above 
formulae).  The  following  table  may  be  quoted  *  to  show  that 
these  weights  even  have  little  influence  : — 


Index-Numbers  for  1895,  when  that  of  1881  is  100,  obtained  by 

Various  Systems  of  Weighting. 


Ratios  of 

Prices  (rlt  r2.  .  .) 

Reciprocal 
of  A.M. 

of  i,  -L , 

Tl  *-2 
&C. 

Weighted 
by  Values 
of  1895 
Quantities 
at  1881 
Prices. 

Weighted 
by  Declared 
Values  in 
1881. 

Arithmetic 

Mean. 

Median. 

Geometric 

Mean. 

Economist' s 
Figures. 

Imports 

67* 

69 

73* 

72* 

72* 

69 

}- 

Exports 

83 

87 

82 

8l 

OO 

75 

Let  b1}  b2  .  .  .  be  quantities  and  p1}  p2  .  .  .  prices  in  1881, 
and  let  clt  c2  .  .  .  be  quantities  and  rxpX)  r2p2  .  .  .  prices  in  1895. 
The  first  column  gives  the  result  of — 


Sum  of  1895  quantities  at  1895  prices 
Sum  of  1895  quantities  at  1881  prices 

=  IQQg^^~>  anc^  the  weights  applied  to  the  r’s  are  the  1895 

quantities  valued  at  the  1881  prices. 

The  second  column  gives  the  result  of — 


100  X 


Sum  of  1881  quantities  at  1895  prices 


Sum  of  1881  quantities  at  1881  prices 
_  iqoSMA  and  the  weights  applied  to  the  r’s  are  the  declared 

SbxPi 

values  of  1881. 


In  the  next  three  columns  the  arithmetic  mean,  the  median, 
and  the  geometric  mean  of  the  r’s  are  given.  In  the  last  column 

but  one  the  arithmetic  mean  of  ,  that  is  of  the  ratios 

r2 

*  From  the  Economic  Journal  (with  a  correction  in  the  statement  of 
weights). 
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of  the  prices  of  1881  to  1895,  is  calculated,  and  the  ratio  of  this 
mean  to  100  equals  the  ratio  of  100  to  a  new  index-number, 
which  corresponds  to  the  former  arithmetic  mean  with  the 
years  1881  and  1895  interchanged.  The  figure  in  the  last 
column  is  calculated  from  material  given  in  the  Economist ; 
every  year  the  imports  and  exports  are  valued  at  their  prices 
in  the  previous  year,  and  thus  an  annual  ratio  is  given  similar 
to  that  in  the  first  column  of  figures  in  the  table  just  given; 
the  number  100,  taken  for  1881,  is  multiplied  by  this  annual 
ratio  year  by  year  till  1895,  and  the  number  71  is  the  result. 
[Algebraically  this  index-number  is — 


100 


x  *  (r  ■  s*) x  K*1  •  s8£) x  x-] 


A  more  complete  analysis  of  these  figures,  and  an  investiga¬ 
tion  as  to  the  causes  of  the  divergence  between  the  export 
indices  87  and  75,  would  show  which  of  the  methods  should  be 
adopted.  Here  we  will  be  content  with  noticing  that  the 
unweighted  average,  82,  is  very  near  the  first  weighted 
average,  83. 

Further  methods  of  dealing  with  such  weights  are  given  on 
pp.  209-21 1,  under  Retail  Index-Numbers. 

The  advantage  of  index-numbers  on  the  Board  of  Trade 
basis  is  that  they  measure  approximately  an  objective  quantity, 
objective  and  a  result  is  obtained  which  can  be  stated  in 

measure.  terms  which  appeal  to  the  ordinary  man  who  is 

not  a  statistician  :  such  as,  “  The  imports  of  1895  would 
have  cost  half  as  much  again  if  their  prices  had  been  those 
of  1881 ;  ”  but  it  does  not  follow  that  this  index  is  the  best 
measure  of  the  less-definable  quantity,  “  Fall  in  the  price  of 
imports/'  where  we  imagine  a  general  cause  affecting  this 
class  of  commodities  whose  action  is  modified  by  other  partial 
causes. 

It  is  important  to  choose  a  normal  year  or  the  average  of  a 
choice  of  base  period  as  base,  for  the  choice  of  year  affects  the 

year.  effective  weights  in  subsequent  comparisons.  Using 

the  following  notation — 


Weights  Price  in 

chosen.  Base  Year. 

Wx  IOO 

w2  100 


Price  in 
Second  Year. 

IOO^ 

IOOr2 


Price  in 
Third  Year. 

ioorp 

IOOf^1 


Ratio  of  Prices  in  Third 
and  Second  Years. 
R1=yii :  Yl 

21  •' 
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and  writing  ioo,  llt  I2  as  the  index-numbers  in  the  three  years, 
we  have — 


St^ioo  Swioor  Swioor1 


S  w 


is 


S  w 

=  Ii  X 


S  w 
S  WY1 


SwY 


=  1.  X 


ioo  :  Ij :  I2 

S  (wr .  R) 
Swy 


Whereas  if  we  had  taken  the  prices  all  as  ioo  in  the  second  year 
we  should  have  I2  =  Ix  x 

S  w 

If  the  averages  were  unweighted  we  should  still  have  the  same 

SrR 

difficulty,  for  then  the  values  would  be  -g—  on  one  system  and 


-Sr  on  the  other. 
n 


Since  errors  in  weights  have  under  ordinary  circumstances  but 
little  effect,  it  is  only  when  a  quite  abnormal  base  year  is  chosen, 
or  when  prices  are  moving  very  irregularly,  that  this  consideration 
becomes  important. 

Professor  Edgeworth  has  pointed  out  that  the  use  of  the  geo¬ 
metric  mean  avoids  this  difficulty  in  the  case  of  Geometric 
unweighted  averages.  In  the  same  notation —  mean* 


ioo  :  l1 :  I2  =  ioo  :  iooVr1r2  .  .  . 


Yn  :  IOO  V Y\Yz 


Yn 


i,  =  ii  x  y  ^  =  ix  x  VRxR* . .  .  r„, 

V  /j/2  .  .  .  Yn 


so  that  the  same  result  is  obtained  for  the  comparison  of  two  years 
whatever  year  is  taken  as  base.* 


Mr.  Sauerbeck  and  the  Economist  both  avoid  in  part  the 
difficulty  of  weighting  the  separate  ratios  by  their  relative  im¬ 
portance  in  consumption,  by  selecting  from  those  other  index- 
commodities  whose  prices  are  most  accurately  numbers, 
determined  more  instances  of  such  widely  consumed  articles 
as  wheat  than  of  less  important  commodities  such  as  linseed. 
Mr.  Sauerbeck  has,  in  his  annual  articles  in  the  JouYnal  of  the 
Royal  Statistical  Society ,  verified  the  correspondence  of  the  un¬ 
weighted  average  of  his  45  ratios  with  the  average  of  the  same 
weighted  on  various  principles.! 

While  the  choice  of  the  special  weights  to  be  employed  is, 

*  On  this  point,  and  on  others  in  this  chapter,  see  article  Index-Numbers, 
in  Palgrave’s  Dictionary  of  Political  Economy. 

j  See,  for  example,  Statistical  Journal,  1900,  pp.  97,  98. 
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when  the  number  of  ratios  taken  is  at  all  considerable,  quite 
importance  of  unimportant,  the  choice  of  the  quantities  dealt 
right  choice  with  has  great  effect  on  the  result.  Thus  import 

of  samples.  figUres#  relating  to  raw  materials  and  the  produce 
of  other  countries,  do  not  lead  to  the  same  index-numbers  as 
export  figures  dealing  with  the  price  of  our  own  produce, 
though  the  tables  just  given  show  that  they  are  little  affected  by 
weights ;  and  neither  of  these  agree  closely  with  Mr.  Sauerbeck’s 
or  the  Economist’s  numbers,  and  these  again  are  not  in  complete 
agreement.-  The  samples  on  which  these  four  sets  of  numbers 
are  based  are  from  different  groups  of  commodities,  and  the 
numbers  show  that  the  same  forces  do  not  affect  these  groups 
in  the  same  degree.  When  we  have  so  multiplied  our  samples, 
that  we  can  subdivide  them  without  affecting  the  index-numbers 
deduced,  we  may  expect  our  results  to  represent  the  required 
measurement.* 

If  we  compare  the  Economist  index-numbers  with  Sauer¬ 
beck’s  during  the  period  1860-70,  we  see  that  the  former  show 
Great  advantage  a  very  much  greater  increase  during  the  cotton 
of  the  median.  famine  than  the  latter.  An  index-number  which 

can  be  greatly  disturbed  by  fluctuations,  however  violent,  in 
only  one  group  of  commodities,  is  clearly  wanting  in  some  of 
the  chief  qualities  of  a  general  measure  of  price  levels.  A  very 
simple  means  of  avoiding  this  difficulty,  and  indeed  all  the 
intricacies  of  weighting,  is  to  take  the  median  of  all  the  price 
ratios  of  a  particular  year  as  the  index-number  of  that  year. 
It  is  perhaps  impossible  to  show  theoretically  that  any  other 
average  satisfies  the  required  conditions  better  than  the  median, 
if  a  sufficient  number  of  items  are  included,  and  there  can  be 
no  doubt  that  it  is  practically  the  easiest  to  calculate. 

If,  on  the  other  hand,  paucity  of  data  makes  the  inclusion  of 
weights  necessary,  and  the  popular  desire  for  concrete  measure- 
proposed  ments  makes  a  fine  show  of  weighting  expedient, 
standard.  we  perhaps  cannot  do  better  than  to  adopt  such 
a  standard  as  that  proposed  by  the  Committee  of  the  British 
Association,  for  the  construction  of  an  index-number,  which 
might  be  the  basis  of  business  transactions  involving  future 
payments.  This  standard  is  as  follows  : — 


*  Mr.  Sauerbeck's  numbers  are  to  be  found  in  annual  articles  by  him 
in  the  Statistical  Journal  ;  and  a  diagram  showing  them  from  1820  is 
published  by  P.  S.  King  &  Son. 
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[  Basis  of  Index-Number  recommended  by  the  Committee  appointed  by 
the  Economic  Section  of  the  British  Association,  1888. 


Articles. 

Estimated 
Expenditure 
per  Annum 
on  each. 
000,000’s 
omitted. 

Wheat  - 

^60 

Barley  ... 

30 

Oats  ... 

5° 

Potatoes,  rice,  &c.  - 

5° 

Meat  ... 

100 

Fish  ... 

20 

Cheese,  butter,  milk 

60 

Sugar 

30 

Tea  ... 

20 

Beer  ... 

100 

Spirits  ... 

40 

Wine  ... 

10 

Tobacco  ... 

10 

Cotton  ... 

20 

Wool  . 

30 

Silk 

20 

Leather  ... 

10 

Coal  ... 

100 

Iron  ... 

50 

Copper  -  -  - 

25 

Lead,  zinc,  tin 

25 

Timber  ... 

3° 

Petroleum 

5 

Indigo  - 

5 

Flax  and  linseed 

10 

Palm  oil 

5 

Caoutchoux 

5 

Hence 

Weights 

assigned. 


Prices  to  be  taken  from 


Gazette  average,  English  wheat. 
«  //  barley. 

n  ,  a  oats. 
Av.  import  price,  potatoes. 
Market  quotations,  live  meat, 
Smithfield. 

Board  of  Trade  Returns;  aver¬ 
age  per  cwt.  landed. 

Cheese  and  butter,  average  im¬ 
port  price. 

Av.  import  price,  refined  sugar. 


// 

tt 

tt 

tea. 

// 

export 

tt 

beer. 

tt 

import 

tt 

spirits. 

ft 

tt 

tt 

wine. 

tt 

tt 

tt 

tobacco. 

tt 

tt 

tt 

cotton. 

tt 

tt 

tt 

wool. 

tt 

tt 

tt 

raw  silk. 

tt 

tt 

tt 

hides. 

tt 

export 

tt 

coal. 

Market  price,  Scotch  pig-iron. 
Av.  import  price,  copper  ore. 

a  n  lead  ore. 

Average  import  price. 

U  H  // 

»  HU 

U  HU 

H  H  H 

H  HU 


American  statisticians  have  adopted  a  method  of  comparing 
totals  instead  of  weighted  or  unweighted  price-ratios  for  the 
formation  of  index-numbers.  “  By  so  doing,  it  is  maintained, 
two  difficulties  are  overcome  :  First,  the  problem  of  choosing 
a  base  year,  since  actual  prices  do  not  necessarily  have  to  be 
reduced  to  a  relative  basis,  and,  second,  of  deciding  on  an 
appropriate  average  of  relatives/ ’  *  In  fact  the  method, 
though  it  may  have  advantages  in  intelligibility  and  simplicity 
of  construction,  introduces  no  new  principle.  It  may  be  thus 
described  : — The  price  of  each  article  in,  say,  1914  is  multiplied 

*  Secrist,  An  Introduction  to  Statistical  Methods,  191 7,  pp.  329  and  339, 
340.  See  the  Bulletin  of  the  United  States  Bureau  of  Labor  Statistics,  Whole 
Number  181,  October  1915. 
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Retail  price 
index. 


by  the  quantity  marketed  in  the  last  census  year,  1909 ;  the 
price  in,  say,  1912  is  multiplied  by  the  same  quantity.  With 
the  aggregate  for  1914  as  the  base,  or  100,  the  index-number 
for  1912  is  obtained  by  comparing  the  1912  aggregate  with  the 
1914  aggregate.  If  wv  w2  .  .  .  are  the  quantities,  Px,  P2  .  .  . 
the  prices  in  1914  and  pv  p2  .  .  .  those  in  1912,  the  aggregates 

are  S^P,  S wp,  and  the  index  is  100  =  100  where 

Rx,  R2  .  .  .  are  the  price  ratios  1914  to  1912.  This  is  equivalent 
to  the  Board  of  Trade  index  discussed  above,  and  has  no  special 
claim  to  accuracy. 

Since  we  can  only  obtain  rough  correspondence  in  dealing 
with  wholesale  prices,  we  cannot  expect  to  be  able  to  measure 
retail  prices  with  any  great  precision.  For  we  saw 
in  the  preceding  chapter  that  the  error  in  an  aver¬ 
age  bears  a  definite  relation  to  the  errors  in  the  items  which 
compose  it ;  if  the  errors  in  the  items  are  on  the  whole  doubled, 
it  is  likely  that  the  errors  in  the  average  and  in  the  ratio  of  two 
averages  will  also  be  doubled,  and  we  shall  need  four  times  *  as 
many  samples  to  restore  the  precision.  Unfortunately  the 
material  for  computing  a  retail  index-number  is  even  more  in¬ 
complete  than  that  for  wholesale  prices,  and  owing  to  the  smaller 
number  of  articles  that  can  be  included,  and  the  preponderance 
of  such  items  as  bread  and  rent,  the  question  of  weighting 
becomes  of  more  importance. 

When  we  wish  to  construct  an  index-number  to  show  the 
purchasing  power  of  money  of  special  classes,  we  must  take 
special  into  account  some  considerations  which  can  be 
difficulties,  ignored  when  dealing  with  wholesale  price  num¬ 
bers.  Different  classes  of  persons  at  the  same  time,  and  the 
same  classes  at  different  times,  spend  their  income  in  varying 
proportions  on  different  objects.  If  we  could  collect  enough 
sufficiently  accurate  samples,  this  fact  would  not  matter  so 
much ;  but  it  would  still  be  of  some  importance  owing  to  the 
tendency  to  make  increased  purchases  of  cheapening  com¬ 
modities.  As  it  is,  it  would  be  necessary  to  construct  separate 
index-numbers  for  each  class  and  each  district.  The  difficulty 
of  insufficient  and  inaccurate  data  cannot  at  present  be  over¬ 
come  ;  but  as  it  is  possible  that  we  may  in  the  future  get  definite 
records  of  retail  prices  sufficiently  numerous  to  make  up  for 


*  See  Part  II,  Chap.  IV. 
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their  want  of  precision,  we  may  glance  at  the  other  details  of 
the  problem.  To  form  an  index-number  for  a  particular  class 
of  people,  we  need  records  of  the  method  of  expenditure  of  their 
income  at  all  the  dates  in  question,  of  sufficient  numbers  to 
obtain  the  slight  precision  which  weighting  needs.  Then  if  we 
had  fairly  good  records  of  retail  prices  several  methods  of 
weighting  are  open  to  us,*  all  of  which  are  likely  Methods  of 
to  give  nearly  the  same  result.  The  necessity  of  weighting, 
weighting  and  the  methods  are  best  shown  by  a  numerical 
illustration,  f 

The  data  for  the  measurement  of  the  change  of  the  cost  of 
living,  however  it  is  defined,  are  always  of  the  same  nature,  and 
consist  of  records  of  the  quantities  of  various  commodities 
bought  and  the  prices  paid  for  them  at  two  dates  or  places  or  by 
representatives  of  different  social  groups.  Thus  we  have  given 
with  greater  or  less  accuracy — 


Commodity. 

Place  or  Date.  A. 

B. 

-ntf- 

Quality. 

Price. 

Expendi¬ 

ture. 

- 

Quality. 

Price. 

Expendi¬ 

ture. 

1 

2 

3 

n 

Qi  x  Pi  —  Ei 

Q2  x  P2  =  e2 

Q3  X  P3  =  E3 

Q/i  X  P»  =  E  n 

(lx  X  px  =  ex 

X  p2  —  e2 

I3  X  p3  =  e3 

q>t  X  pn  ==:  &n 

In  the  Table  on  p.  210  are  shown  in  this  form  the  budgets 
used  in  the  Report  of  the  Committee  on  Cost  of  Living,  1919. 

The  second  year’s  budget  at  the  first  year’s  prices  would 
have  cost  225*5 d.  instead  of  455*5^.  The  index-number  of 


retail  prices  on  this  basis  is  100  X  or  IOo  =  202*0. 

^  225*5  S?P 

The  weight  applied  to  a  price  ratio  p  :  P  is  qV.  The  index- 
number  =  100  ^ . , . (a) 


The  first  year’s  budget  at  the  second  year’s  prices  would  cost 
521*6 d.  instead  of  246*5^.  The  index-number  on  this  basis  is 


*  See  article  on  Wages,  Nominal  and  Real,  in  Palgrave’s  Dictionary  of 
Political  Economy,  pp.  640-41. 

f  Taken  with  part  of  the  context  from  “  The  Measurement  of  Changes  in 
the  Cost  of  Living,"  Statistical  Journal,  1919,  pp.  343  seq. 

P* 
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521*6  SO  p  r 

ioo  x  .  -  or  ioo  =  211-6 . 

246-5  SQP 

The  weight  applied  to  a  ratio  p  :  P  is  QP  or  E. 
the  method  used  in  the  Labour  Gazette  to  measure  the  ' 
increase  in  retail  prices.” 


•  (b) 

This  is 
average 


Urban  Working-class  Budgets.  (Based  on  Cd.  8980,  p.  18.) 

Expenditure  of  Standard  Family. 


1914. 

June,  1918. 

PIP 

Price 

Ratio. 

Q 

Quan¬ 

tity. 

P 

Price. 

E 

Ex¬ 

pendi¬ 

ture. 

Q 

Quan¬ 

tity. 

P 

Price. 

e 

Ex¬ 

pendi¬ 

ture. 

d. 

d. 

d. 

d. 

I. 

Bread  and  flour 

lbs. 

33-5 

I*5I 

50-5 

34-5 

2.36 

81.5 

I.56 

2. 

Meat 

yy 

6.8 

8.6 

58.5 

4.4 

18.6 

82.0 

2.15 

3- 

Bacon 

)} 

1.2 

11.7 

I4.0 

2-55 

26.1 

66.5 

2.24 

4- 

Lard,  suet,  etc.  - 

yy 

1.0 

7-5 

7-5 

.78 

17.9 

I4.0 

2.29 

5- 

Eggs 

No. 

13 

1.0 

13.0 

9.1 

4.0 

36.5 

4.00 

6. 

New  milk  - 

pints 

9.2 

1.8 

16.5 

11.7 

3-o 

35-5 

I.69 

7- 

Condensed  milk 

tins 

•25 

6.0 

i-5 

•59 

14-5 

8.5 

I.42 

8. 

Cheese 

lbs. 

.84 

8.9 

7-5 

.41 

20.7 

8-5 

2.32 

9- 

Butter 

yy 

1.70 

I4*4 

24-5 

•79 

29.7 

23-5 

2.0  7 

10. 

Margarine 

yy 

.42 

6.0 

2.5 

.91 

12. 1 

11.0 

2.01 

11. 

Potatoes  - 

15.6 

•  7 

11.0 

20 

1.25 

25.0 

I.78 

12. 

Rice  and  tapioca 

yy 

!.4 

3-2 

4-5 

i-3 

5-8 

7-5 

1.82 

13- 

Oatmeal  - 

yy 

1.3 

1.9 

2.5 

1.4 

4-3 

6.0 

2.24 

I4* 

Tea  - 

yy 

.68 

21.3 

I4'5 

•57 

33-3 

19.0 

I.56 

15- 

Coffee 

yy 

.09 

16.7 

i-5 

.12 

25.0 

3-o 

1.50 

16. 

Cocoa 

yy 

.18 

19.4 

3-5 

•23 

32.6 

7-5 

I.69 

1 7- 

Sugar 

yy 

5-9 

2.2 

13.0 

2.83 

7.07 

20.0 

3.21 

Total 

_ 

-  — . 

—  - 

246.5 

— 

— 

455-5 

— 

Other  food 

- 

— 

— 

52.5 

— 

— 

hi. 5 

— 

Total 

- 

— 

— 

299.0 

— 

— ■ 

567.0 

— 

S.  QP=246.5  S.  qp  —  455.5  S.  Op— 521.6  S.  <?P=225.5 
S.  *4-S.  E  =  i.9o  S.  Q^is.  QP=2?i2  S.  ?P=2.o2 


In  some  cases  there  may  be  reasons  for  preferring  {a)  or 
preferring  (b).  If  not,  it  is  reasonable  to  take  a  mean  between 
the  results ;  the  arithmetic  mean  is  206-8,  the  geometric  mean 
is  206-74,  the  harmonic  mean  206-69,  and  it  is  usually  indifferent 
which  we  take.  Or  a  method  which  may  be  commended  for 
its  simplicity  in  idea  is  to  take  the  averages  of  the  quantities 

.)  and  find  their  cost  in  each 

S(Q  +  q)P 


seriatim  (J  Q,  +  qv  £  Q2  +  q2  . 
year  and  compare  their  sums.  This  gives 


X  100 


S(Q  +  ?)P 

203-7.  The  weight  applied  to  a  ratio  is  now  (Q  +  </)P  (c) 
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Another  method  is  to  take  the  average  of  the  expenditures 
at  the  two  dates  on  each  item  as  the  weight  for  the  price  ratio 

of  that  item,  so  obtaining  =  I9^*6 . (d) 

This,  however,  involves  the  quantity  />2/P  in  the  numerator 
and  gives  undue  weight  to  exceptional  movements  of  prices 
of  particular  commodities. 

In  absence  of  knowledge  of  quantities  the  simple  average 

of  the  price  ratios  X  ioo  =  209*1 . (e) 

is  sometimes  taken ;  but  it  is  never  safe  to  neglect  weights  in 
this  problem,  though  it  is  not  necessary  to  aim  at  great  precision 
in  them. 

Finally,  a  more  complicated  method  has  been  advocated  in 
which  it  is  supposed  that  the  second  total  is  expended  in  the 
same  proportion  item  by  item  as  the  first,  and  the  quantities 
of  each  item  thus  purchasable  are  valued  at  the  price  in  the 
first  year.  The  ratio  of  the  whole  actual  expenditure  in  the 
first  year  ( x  100)  to  the  expenditure  so  calculated 


100  Se 


SIB; 


S*xb+.  .} 


=  IOO 


SE 

SE£ 


196*4 


(/) 


aSE  "  p1 

The  weight  applied  here  to  the  ratio  p  is  QP2  -f  p,  and  as  in 


case  ( d )  gives  undue  weight  to  particular  prices.  Also  there  is 
no  reason  to  suppose  that  the  expenditure  is  kept  in  a  constant 
ratio  item  by  item. 

No  agreement  has  been  reached  on  the  question  which 
method  is  the  best  for  the  measurement  of  retail  prices ;  but 
there  are  serious  theoretical  objections  to  (d)  ( e )  (/).  There  is 
nothing  in  general  to  choose  between  (a)  and  ( b ),  but  for  this 
purpose  one  year  has  the  same  claim  to  be  included  as  the 
other  and  we  are  therefore  obliged  to  take  a  mean.  Of  the 
various  means  the  method  ( c )  of  averaging  the  quantities  is  the 
most  sensitive,  is  quite  easy  to  compute,  and  on  all  grounds  is  to 
be  recommended.* 

The  problem  of  measuring  the  movement  of  retail  prices 
has  been  generally  confused  with  that  of  measuring  the  change 


*  This  opinion  is  different  from  that  expressed  in  former  editions.  For 
further  information  see  the  bibliography  in  the  article  on  Workmen's  Budgets 
in  Palgrave's  Dictionary  of  Political  Economy. 
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in  cost  of  a  standard  (representing  either  minimum  subsistence 
or  efficiency  subsistence)  with  the  items  the  same  at  both  dates. 
It  is  not  proposed  here  to  discuss  such  a  measurement  in  detail, 
but  it  should  be  realised  that  there  is  a  continual  change  in 
the  prices  and  supply  of  the  various  commodities.  For  such 
budgets  it  ought  to  be  assumed  that  the  same  nourishment 
(or  more  generally  the  same  satisfaction)  is  obtained  at  each 
date  by  the  most  economical  purchases,  so  that  the  quantities 
of  those  foods  whose  price  has  risen  least  or  fallen  most  are 
increased  while  others  are  diminished,  and  consequently  an 
upward  movement  is  less  and  a  downward  movement  greater 
than  that  measured  by  method  (a).* 

There  are  still  two  further  considerations  which  hinder 
the  complete  solution  of  the  problem.  In  all  budgets  rent  is 
Further  an  important  item,  and  there  seems  no  prospect 

difficulties.  0f  obtaining  any  good  estimate  of  the  relation 

between  increasing  rent  and  improving  accommodation,  allow¬ 
ing  for  the  benefits  of  public  expenditure  paid  by  rates  included 
in  rent.  Again,  if  we  consider,  not  how  money  is  spent,  but 
how  it  might  be  spent,  we  should  have  to  introduce  a  more 
general  factor ;  for  the  margin  which  remains  when  necessities 
are  satisfied  may  have  a  rapidly  growing  purchasing  power, 
as  the  products  of  machinery  increase  in  variety  and  diminish 
in  price ;  perhaps  the  calculated  fall  in  wholesale  prices  forms  a 
fair  measure  of  this  growth. 

Leaving  this  very  difficult  problem,  let  us  return  for  a 
moment  to  the  measurement  of  a  quantity  more  typical  of 
index-numbers,  f  If  we  have  to  measure  the  action  of  a  cause, 
index-numbers  which  affects  quantities  which  have  no  common 
of  consumption.  measure,  we  are  still  able  to  apply  index-numbers. 
A  general  increase  has  taken  place  in  the  consumption  of 
imported  goods,  and  if  we  can  measure  this  increase  indepen¬ 
dently  of  any  change  in  price,  we  can  use  it  for  criticism  of  any 
measurement  of  a  movement  in  real  wages.  The  only  common 
measure  of  bread,  currants,  cheese,  meat,  etc.,  of  practical  value 
is  their  price,  their  weight  being  useless  for  the  purpose; 
consequently  another  method  is  necessary.  If  the  quantities 

*  For  the  discussion  of  these  questions  see  “  Cost  of  Living,”  Statistical 
Journal,  May  1919. 

f  The  following  illustration  is  based  on  Mr.  G.  H.  Wood's  paper  on 
‘  ‘  Some  Statistics  of  Working  Class  Progress,”  Statistical  Journal ,  1899. 


INDEX-NUMBERS 


213 


consumed  year  by  year  of  a  number  of  such  commodities  are 
written  down,  expressed  as  percentages  of  the  consumption 
in  any  years  (not  necessarily  the  same),  we  have  series  of 
numbers  which  only  need  weighting  to  form  the  index-number 
required.  We  can  in  this  case  verify,  that  any  logical  choice 
of  weights,  based  on  their  value  or  their  assumed  importance, 
or  even  a  random  system  of  weights,  gives  much  the  same 
index-number  as  the  simple  arithmetic  averages ;  in  fact,  we 
have  a  sufficiently  good  group  of  samples  to  render  us  nearly 
independent  of  weights.  When  this  is  the  case  we  can  say  with 
safety  that  the  number  required  lies  in  the  neighbourhood  of  the 
group  given  by  the  various  systems  of  weights,  and  choose  what 
appears  the  most  logical  system  for  the  estimate  we  adopt.  In 
the  paper  referred  to,  five  different  systems  applied  to  only 
fourteen  commodities  give  results  for  the  increase  of  consump¬ 
tion  all  between  13-8  and  20-1  per  cent,  in  the  period  1873-96. 

The  application  of  index-numbers  to  wage  statistics  does  not 
involve  any  fresh  principles.  It  is  not  permissible  to  ignore 
the  change  of  weights  in  this  case ;  for  otherwise  wage  index- 
we  should  not  allow  for  the  general  tendency  to  numbers, 
increase  numbers  where  wages  are  rising.  There  is  great 
liability  to  “  biassed  ”  errors  in  separate  averages;  for  wages 
for  overtime,  specially  high  piece-wages,  wages  of  large  uncom¬ 
bined  classes  of  low-skilled  or  badly  paid  workpeople,  may  often 
be  omitted  in  wage  records.  These  biassed  errors,  however, 
tend  to  disappear  in  comparison;  and  it  may  prove  possible 
to  construct  a  wage  index-number  of  very  fair  precision.* 

*  For  a  complete  illustration  of  method  and  of  the  various  factors 
involved,  see  “The  Statistics  of  Wages  in  the  United  Kingdom.  Part  XIV.  : 
Engineering  and  Shipbuilding,"  Statistical  Journal,  March  1906,  pp.  154  seq., 
especially  pp.  166,  168  and  185. 
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Section  i. — General. 

It  is  very  often  the  case  in  practical  statistics  that  we  are 
not  able  to  make  serial  estimates  as  frequent  or  descriptions 
Necessity  of  of  groups  as  detailed,  as  is  necessary  for  their  use 
interpolation.  jn  further  investigations.  Thus  the  population  is 

only  counted  once  in  ten  years ;  but  we  need  to  bring  monthly 
and  annual  accounts — births,  deaths,  trade  returns,  etc. — into 
close  relation  to  the  existing  number  of  people,  and  estimates 
for  the  budget  and  the  yield  of  taxes  must  be  based  on  the 
assumed  number  of  taxpayers  for  the  current  year;  it  is 
therefore  necessary  to  interpolate  estimates  for  the  number  of 
the  people  in  intercensal  years.  Again,  interpolation  is  needed 
for  the  statement  of  the  distribution  of  the  population  accord¬ 
ing  to  age,  a  tabulation  which  is  necessary  for  actuarial  work 
and  for  sociological  purposes.  The  ages  returned  on  the 
householder’s  schedule  are  nominally  correct  to  the  year,  but 
in  practice  they  are  known  to  be  inaccurate,  tending  to  group 
themselves  in  the  neighbourhood  of  round  numbers ;  but  the 
returns  for  such  age  periods  at  35-45  years  are  more  correct, 
since  the  persons  who  return  themselves  as  40  years  old  are 
probably  within  5  years  of  that  age.  The  original  returns 
are  so  erroneous  that  prior  to  1911  they  were  not  published, 
but  the  numbers  were  only  given  in  the  ten-yearly  periods; 
from  the  numbers  so  given,  it  is  necessary  to  estimate  the 
numbers  for  the  individual  years.  Again,  the  compilers  of  the 
wage  census  of  1886-91  enumerate  the  numbers  earning  wages 
“  of  15s.  and  under  20s.,”  “  of  20s.  and  under  25s.,”  and  so 
on,  but  not  the  numbers  in  shilling  limits.  In  problems 
relating  to  wages  we  often  need  more  detail ;  and  when  we  are 
comparing  these  wages  with  a  similar  group  in  France,  we 
must  devise  a  scheme  by  which  grades  of  2  francs  can  be  com¬ 
pared  with  grades  of  5s.,  by  a  suitable  system  of  interpolation. 
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Such  a  necessity  is  very  common  when  we  wish  to  compare 
groups,  which  are  similar  but  tabulated  on  diverse  systems. 
Thus,  two  countries  conduct  their  census  at  different  dates. 
In  one  country  the  age  groups  are  of  fifteen  years,  in  another 
of  ten;  in  one,  “young  persons”  are  those  under  21; 
in  another,  those  under  18.  Occasional  estimates  seldom 
correspond  in  date ;  wage  statistics  are  found  for  1840,  1850, 
and  1892  in  France,  and  for  1866,  1885,  1886,  and  1891  in 
England.  Similar  differences  are  found  when  we  are  com¬ 
paring  county  with  county ;  and  a  discussion  of  the  method  of 
determining  averages  in  such  a  case  will  illustrate  some  of  the 
elementary  problems  of  interpolation. 

Suppose  that  the  figures  printed  in  Roman  type  in  the 
following  table  are  accurate  returns  of  the  weekly  Elementary 
wages  in  three  districts,  and  that  we  wish  to  find  example, 
the  average  change  in  the  three  together. 


It  is  clear  that  there  is  something  to  be  learnt  about  the 
general  course  of  wages  from  the  data,  but  the  lessons  are  not 
obvious.  The  following  figures,  printed  in  the  table  in  italics, 
are  those  which  naturally  suggest  themselves.  There  is  no 
sign  in  A  of  any  change  between  1862  and  1866,  so  we  write 
ijs.  for  1864.  Judging  from  B,  the  figure  for  1870  is  not 
likely  to  have  been  lower  than  that  for  1864,  so  we  write  /5s. 
for  A  in  1870.  A  is  now  complete ;  we  notice  that  in  A  the 
first  rise  was  complete  by  1862,  and  assuming  the  same  in  B, 
we  obtain  ips.  for  1862.  In  C  there  is  a  rise  between  1864 
and  1866,  while  in  A  there  is  no  change  from  1866  to  1870  ; 
B  will  correspond  if  we  write  20s.  in  1866.  If  we  write  for  B, 
ips.  6d.  in  1871,  21s.  in  1875,  and  20s.  6d.  in  1880,  we  shall 
have  close  correspondence  with  A  from  1866  to  i88r.  Similar 
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reasons  lead  to  the  numbers  interpolated  for  C.  The  un¬ 
weighted  average  can  then  be  calculated  year  by  year,  which 
could  not  be  done  directly  from  the  date.  This  average 
reflects  all  the  changes  in  the  original  figures  and  gives  no  special 
predominance  to  any.  It  may  be  regarded  as  the  most  probable 
series  that  can  be  based  on  the  given  information. 

We  will  now  notice  the  assumptions  tacitly  made  in  pro¬ 
ceeding  by  this  method.  First,  it  has  been  assumed  that 
Assumptions  there  are  no  sudden  jumps,  that  such  a  figure  as 
made’  20s.  for  A  1864  is  inadmissible ;  this  is  only  justifi¬ 
able  if  we  are  acquainted  with  the  general  causes  which  influence 
the  rate  of  wages,  and  know  that  there  was  no  violent  disturb¬ 
ance  in  the  intermediate  dates.  We  could  not  make  this 
assumption  as  to  wages  in  the  cotton  trade  in  the  time  of  the 
American  Civil  Wars,  nor  can  we  make  it  over  a  long  series 
of  years.  Secondly,  it  has  been  assumed  that  in  the  absence 
of  evidence  to  the  contrary  the  rise  or  fall  has  been  uniform. 
Thus,  in  B  1878-81,  the  wage  in  1880  is  assumed  to  be  inter¬ 
mediate  between  1878  and  1881 ;  if  there  had  been  no  indica¬ 
tion  from  A  that  it  was  half-way  between  in  point  of  wages, 
it  might  have  been  said  that  in  point  of  time  it  was  two-thirds 
of  the  way,  and  205.  Sd.  should  be  interpolated  for  1879  and 
20s.  4 d.  for  1880,  if  it  was  worth  while  to  depart  from  round 
numbers.  Thirdly,  it  has  been  assumed  that  the  course  of 
wages  in  the  three  districts  was  similar.  Thus  in  A  there  is 
a  rise  from  1860-62,  but  there  is  no  further  improvement  at 
any  rate  before  1866 ;  it  is  consequently  assumed  that  the  rise 
registered  in  B  and  C  before  1864  actually  took  place  before 
1862.  Again,  when  considering  the  period  1870-75,  we  notice 
that  in  A  there  is  a  fall  till  1871,  and  a  sharp  rise  to  1875,  and 
no  change  to  1878 ;  in  B,  therefore,  it  is  assumed  that  the  wage 
of  1875  is  equal  to  that  of  1878,  and  the  fall  in  1878  may  be 
allowed  because  it  increases  the  sharpness  of  the  rise  in  1871-75. 
In  C  it  is  doubtful  whether  the  12s.  in  1871  should  not  rather 
be  ns.  6d.  The  reasons  against  are  that  a  gain  on  a  low  wage 
is  often  not  so  easily  lost  as  a  gain  on  a  high  one ;  6 d.  is  a 
larger  drop  proportionately  on  12s.  than  on  15s. ;  that  the  rise 
of  3s.  6 d.  which  would  then  be  shown  1871-75  is  a  larger 
proportionate  rise  than  in  either  A  or  B ;  and  that  the  exist¬ 
ence  of  the  fall  in  1870-71  depends  only  on  the  evidence  of  a 
fall  between  1866-71.  When  the  figures  are  few  in  number, 
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it  is  necessary  to  examine  them  in  this  way  to  pick  out  the 
most  probable ;  and  it  is  often  fairly  easy  to  fill  in  the  figures 
which  satisfy  all  the  existing  evidence  fairly  closely. 

The  question  at  once  arises,  What  certainty  have  we  that 
these  quantities,  by  hypothesis  unknown,  are  in  reality  any¬ 
where  near  the  figures  which  on  the  face  are  most  probable  ? 

In  some  cases  of  interpolation,  dealt  with  presently,  the 
answer  can  be  given  as  a  statement  of  mathematical  proba¬ 
bility,  such  as  :  it  is  2  to  1  against  a  divergence 
of  6 d.  from  the  assigned  figure,  30  to  1  against 
one  of  is.,  1000  to  1  against  one  of  2s.  6d.,  and  so  on;  but 
in  the  figures  most  often  cropping  up  in  investigations  it  is 
not  possible  to  assign  such  a  precise  probability.  There  is 
one  rough  but  useful  way  of  testing  the  accuracy  of  such 
interpolation  as  in  the  case  before  us  which  can  be  explained 
by  an  example.  Test  how  far  we  can  throw  out  our  calculated 
average  for  1870,  without  violently  infringing  the  common- 
sense  of  the  question.  Make  A  and  C  as  large  as  possible  in 
these  dates ;  we  may  perhaps  suppose  a  rise  of  is.  above  1866, 
seeing  that  there  is  one  in  B  between  1864  and  1870.  We  can 
hardly  suppose  either  that  1870  is  as  high  as  1875-78,  or  that 
there  is  a  great  drop  of  as  much  as  2 s.  in  the  single  year,  if  we 
are  acquainted  with  the  causes  that  determine  the  wages  at 
those  dates.  Let  the  highest  wage  we  can  assign  to  A  and  C 
be  165.  6 d.  and  13s.  6 d.  respectively.  Our  average  is  then 
16s.  8d.  instead  of  155.  8d.  Similarly,  we  might  perhaps  think 
that  14s.  and  11s.  were  the  lowest  possible  in  A  and  C  in  1870 ; 
then  the  average  would  be  15s.  Assuming  that  we  know  enough 
about  the  general  trend  of  events  at  these  dates  to  assign  limits 
in  this  way,  we  can  say  it  appears  improbable  that  the  average 
wage  in  1870  was  less  than  15s.  or  more  than  16s.  8d.,  and  that 
the  evidence  points  to  15s.  8d. 

The  accuracy  of  our  interpolation  then  depends — (1)  On 
knowledge  of  the  possible  fluctuations  of  the  figures,  to  be 
obtained  by  a  general  inspection  of  the  fluctuations  at  dates 
for  which  they  are  given;  (2)  on  knowledge  of  the  course  of 
the  events  with  which  the  figures  are  connected. 

A  second  example  of  a  similar  kind  *  may  be  Numerical 
given  to  illustrate  the  numerical  calculation.  example. 

*  Taken  from  “Agricultural  Wages  in  England,”  in  the  Statistical  Journal 
December  1898,  by  the  present  author, 
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Northern  Counties.  Weekly  Agricultural  Wages  in 


1867-69. 

1869-70. 

s. 

d. 

S. 

d. 

Cheshire  - 

•  r3 

I 

13 

6 

Lancashire  - 

*  is 

O 

15 

0 

West  Riding  of  Yorkshire 

*  14 

6 

16 

5 

East  ,, 

*  14 

6 

11 

North  „  ,, 

•  14 

6 

i5 

4 

Durham  .... 

-  16 

6 

16 

0 

Northumberland  - 

-  16 

6 

16 

7 

Cumberland  ... 

-  '4 

4 

14 

9 

Westmoreland 

-  i5 

7 

16 

1 

Roman  figures  given.  Italic  figures  interpolated. 

The  averages  of  the  wages  in  the  five  districts  for  which 
data  exist  in  both  periods  are  15s.  4-8^.  in  1867-69  and  15s. 
10-4^.  in  1869-70,  that  is  in  the  ratio  33  :  34.  If  we  assume  that 
the  wages  in  the  other  counties  have  been  influenced  by  similar 
causes  and  increased  in  the  same  ratio,  we  obtain  the  figures 
interpolated  in  the  table.  The  unweighted  averages  for  the 
northern  counties  are  now  14s.  11  d.  and  15s.  5 d.  in  the  two 
periods,  instead  of  15s.  3 d.  and  15s.  5 d.,  the  averages  of  the 
given  numbers.  For  general  comparison  all  over  England 
between  these  two  years  we  should  have  been  obliged  to 
neglect  the  missing  counties  in  both  years,  which  would  have 
unfairly  lowered  the  general  average,  since  these  counties  have 
in  recent  times  had  wages  above  the  English  average  though 
below  that  of  the  northern  district.  At  the  same  time  we 
should  have  unfairly  raised  the  apparent  average  of  the  northern 
district.  We  should  also  have  lost  the  probable  figures  for  the 
special  counties  at  the  earlier  date  which  are  on  a  fairly  safe 
basis ;  for  the  wages  in  these  counties  of  the  Northern  District 
remain  in  nearly  the  same  order  through  the  last  fifty  years. 
At  the  same  time  it  is  easily  seen  that  these  wages  are  not  so 
accurately  known  as  those  not  interpolated,  and  it  is  well  to 
notice  in  arguments  based  on  such  figures,  to  what  extent  the 
interpolated  figures  are  involved. 

A  process  very  similar  to  that  just  employed  is  used  in 
giving  marks  at  school  to  students  who  are  absent  from  a 
lesson;  attention  is  paid  both  to  the  particular  student’s 
general  place  in  the  class  order,  and  to  the  average  value  of  the 
marks  obtained  by  the  rest  of  the  class  in  the  lesson  missed. 

Though  the  method  be  fairly  complete  it  is  very  important 
to  notice  that  interpolated  figures  rest  on  quite  a  different  class. 
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of  evidence  to  those  which  are  the  result  of  direct  evidence. 
In  some  cases  they  may  represent  quantities 
which  have  no  existence  (as  in  the  case  of  school  Sn^ishing 
marks)  and  which  are  only  used  for  convenience  of  interpolated 
calculation.  In  others  they  are  simply  figures 
adopted  as  those  which  in  default  of  definite  knowledge  appear 
most  probable.  They  must  always  be  clearly  indicated  as  inter¬ 
polations  ;  it  is  always  well  to  state  the  method  by  which  they 
are  obtained,  and  any  subsidiary  information  which  may  be 
regarded  as  direct  evidence  of  their  accuracy,  and  if  practicable 
they  may  be  given  not  as  exact,  but  as  lying  between  certain 
limits;  thus  the  interpolated  figures  for  Cheshire  might  be 
written  12s.  6d.  to  ijs.  6d.,  instead  of  ijs.  id. 

Several  different  cases  are  met  with  in  interpolation,  some 
of  which  are  treated  algebraically  in  the  next  section,  while 
others  can  be  illustrated  at  once  by  numerical  examples. 

The  Graphic  Method. — If  we  know  the  values  of  quan¬ 
tities  at  isolated  positions,  such  as  the  numbers  of  the  popula¬ 
tion  at  the  ages  25  to  35,  35  to  45,  etc. ;  the  Graphic 

population  in  1871,  1881,  1891,  etc. ;  wages  in  method. 

i860,  1870,  1873,  etc. ;  the  numbers  whose  wages  are  from 
15s.  to  20s.,  20s.  to  25s.,  etc.,  we  may  represent  the  facts  by 
such  a  diagram  as — 


m 

<L> 


a 

a 

3 

a 


Years  i860  1866  1870  1877  1880  1884 

Suppose  that  we  need  the  value  of  the  quantity  in  1875. 
If  we  were  only  given  the  two  points  c  and  d,  the  simplest 
hypothesis,  and  the  one  to  be  made  in  the  absence  of  any 
evidence  to  the  contrary,  is  that  the  quantity  increased 
uniformly  between  c  and  d;  representing  such  an  increase 
by  the  straight  line  c  D,  the  height  of  the  point  x  will  represent 
the  quantity  in  1875, 
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If  the  point  E  is  also  given,  the  hypothesis- represented  by 
the  straight  lines  c  D,  D  E  will  not  stand,  for  it  assumes  a 
sudden  break  in  the  regularity  at  the  point  D  in  1877,  for  which 
there  is  no  evidence.  We  must  take  into  account  all  the 
points  given,  and  through  them  all  a  line  must  be  drawn  whose 
curvature  is  as  smooth  as  possible,  for  in  the  absence  of  evidence 
to  the  contrary,  sudden  changes  in  the  quantities  may  be 
assumed  not  to  exist.  Such  a  curve  can  be  constructed  on 
mathematical  principles,  or  may  be  drawn  freehand;  if  the 
latter,  it  will  often  be  quite  as  near  the  facts  as  the  arguments 
will  allow  us  to  go. 

This  method  only  applies  to  continuous  quantities,  such  as 
numbers  at  different  ages,  population  at  different  dates,  earners 
at  different  wages  in  a  very  large  group  of  wages.  Thus  for  all 
England  the  average  wage  must  change  gradually,  but  the 
wage  of  the  London  builders  changed  suddenly  as  the  result  of 
strikes  and  arrangements  at  certain  dates.  In  this  case  we 
must  draw  the  figure  to  correspond  as  closely  as  possible  to 
the  evidence,  such  as — 


where  A  b  represents  a  sudden  rise ;  B  c  a  gradually  accelerated 
increase  due  to  improving  trade,  c  d  a  slow  falling  off  from 
the  wage  reached  at  c,  and  d  e  a  determined  and  successful 
effort  to  recover  the  lost  ground. 

Periodic  Figures. — If  we  know  the  annual  averages  of 
figures  which  have  a  yearly  period  and  a  sufficient  number  of 
monthly  averages  to  estimate  the  periodic  fluctuations  by  the 
method  described  on  pp.  160  seq.,  we  can  interpolate  figures  for 
any  month  for  which  the  returns  are  incomplete  with  fair 
accuracy.  Thus  if  we  are  dealing  with  the  numbers  of 
unemployed  as  given  in  the  Labour  Gazette ,  we  find  a  periodicity 
which  is  not  very  strongly  marked  in  all  the  months,  but  there 
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is  in  general  a  fall  in  the  spring  and  a  rise  in  the  late  autumn, 
and  June  is  generally  the  minimum  month.  We  can  then 
make  use  of  the  small  diagrams  on  pp.  1 65-6,  and,  having  marked 
in  all  the  information  we  have,  draw  the  waves  on  the  rising, 
stationary,  or  descending  line  of  averages,  so  that  the  fluctu¬ 
ating  lines  shall  pass  through  all  the  given  points.  We  can 
obtain  an  idea  of  the  accuracy  of  the  resulting  figures  by  notic¬ 
ing  the  general  characteristics  of  the  given  figures ;  we  find 
that  the  percentage  unemployed  has  never  changed  more  than 
two  units  in  one  month,  that  there  are  no  fluctuations  which 
have  lasted  less  than  three  or  four  months,  and  that  the 
percentages  have  never  been  below  1  or  above  10.  Finally, 
we  can  look  at  the  trade  history  of  particular  dates,  and  in 
the  light  we  thus  obtain  reject  any  improbable  figures. 

Use  of  Subsidiary  Curves. — If  we  are  able,  by  the 
methods  described  in  Chapter  VII,  p.  158  or  p.  174,  to 
find  a  close  connection  between  two  series,  we  can  use  the 
more  complete  of  them  to  assist  the  interpolation  of  any  missing 
figures  in  the  other.  We  must  first  investigate  carefully  the 
closeness  and  nature  of  correspondence  at  the  dates  for  which 
we  have  complete  figures  in  both  series.  Then  we  can  draw 
diagrams,  similar  to  those  facing  p.  155,  one  of  the  lines  being 
incomplete.  Then  completing  the  broken  line,  so  as  to  bring 
it  into  as  close  resemblance  with  the  completed  line  as  the 
given  points  allow,  we  shall  obtain  the  most  probable  values 
for  the  missing  figures.  The  accuracy  of  the  result  can  be 
tested  as  in  the  previous  case.  This  method  may  reasonably 
be  used  in  interpolating  figures  for  the  yield  from  one  source  of 
revenue  by  means  of  the  yield  from  another ;  for  the  value  of 
exports  from  that  of  imports ;  for  the  marriage  rate  from  foreign 
trade;  for  the  wages  in  one  district  from  those  in  another; 
for  the  number  of  unemployed  from  the  changes  in  consumption 
of  foods ;  for  changes  in  parts  of  the  population,  when  we  know 
the  changes  in  the  whole,  and  for  many  other  series. 

Section  2. — Algebraic  Treatment. 

The  problem  of  interpolation  to  which  most  attention  has 
been  given  may  be  stated  as  follows  : — When  one  quantity  is 
subject  to  continuous  regular  change,  and  a  second  quantity 
changes  in  connection  with  it,  and  we  know  or  can  estimate 
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directly  only  some  discontinuous  values  of  this  second  quantity, 
it  is  required  to  find  the  probable  values  of  the  second  quantity 
which  correspond  to  given  values  of  the  first :  for  instance, 
given  the  expectation  of  life  at  the  ages  15,  20,  25,  etc.,  it  is 
required  to  find  it  for  intermediate  ages;  given  the  popula¬ 
tion  of  the  country  in  1871,  1881,  1891,  1901,  find  it  at  inter¬ 
mediate  dates.  The  only  permissible  assumptions  are  that 
the  quantity  changes  continuously,  that  is  with  no  break 
at  any  figure,  and  that  the  rate  of  change  of  the  quantity  is  also 
continuous,  that  is  that  the  line  representing  its  value  is  not 
angular,  but  smooth.  The  problem  can  only  be  attacked 
systematically  by  the  use  of  the  algebraic  method  of  finite 
differences,  and  it  is  necessary  to  begin  with  definitions  of 
notation  and  to  obtain  certain  fundamental  formulae. 

1.  Let  y  be  a  continuous  function  of  x,  and  let  yot  yv  y2  .  .  . 
be  the  values  of  y  when  x  =  x0,  xv  x2  .  .  .  . 

Arrange  a  table  thus — 


Values 

Values 

First 

Second 

Third 

of  X. 

of  y. 

Differences. 

Differences. 

Differences. 

X0 

7o 

<1 

y  1 

Ad 

Ad 

Ad 

y2 

Ad 

Ad 

Ad 

*3 

ya 

A31 

a22 

Ad 

Ad 

*4 

y  4 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

Here  each  A  is  obtained  by  subtracting  the  entry  just  higher 
than  it  in  the  previous  column  from  that  just  lower  than  it;  e.g., 
Ao1  =  yi  —  yo,  Aj1  =  ys  —  yv  .  .  .  Ao2  =  A,1  -  A.1,  ...  A»3  =  A,2 
—  A02  .  .  .  The  table  may  be  supposed  to  continue  indefinitely 
downwards  and  to  the  right. 

We  have  at  once — 


A„2=A11— A„i=(y2- y1)-(y1-yo)=yi-2y1+yo 
A<2=y2+*— 2y1+,-}-y,,  where  t  is  any  integer, 

A03  =  (y3-2y2+yj)  -  (y2-2yi+y»)  =y3-3y2+3yi— y« 

Ai3  =  y3+<— 3y2+(-)-3yi+(_y1 


and  generally,  by  an  induction  similar  to  that  commonly  used 
in  the  proof  of  the  Binomial  Theorem  and  involving  the  same 
coefficients — 


A  or=yr—r 


yr-i+ 


r(r—  i)(r— 2) 
1.2.3 


to  r- f-i  terms 


yr-3+  •  •  • 

*  •  •  t  • 


I  .  2 
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y(y _ - 

and  ktr=yr+i— r  .  yr+«-i+  v - 'yr+t.2-}-  .  .  .  to  r-fi  terms,  (/8) 

I  •  2 

where  r  is  any  integer. 

We  have  also — 


y ^  —  yo~\~  Ao+and  y 2 — y^-f-A]+ —  (yy+"  A<+)  ~F  (Ao4-!- Aow)  — yo-j_2A04-+ A<?“, 
and  similarly  A11  =  A014-A(>2, 

and  A21=A11+A12=(A01+A02)+(A02+A(>3)  =  A(,1+2A(,2+A(,3. 

.*.  y3=y2d- A21=yod-3^o1+3^o2+A03, 

and  similarly  A34  =  Ao1-f-3Ao2-+3A034-Ao4. 

Continuing  this  process  we  again  have  the  Binomial  Coefficients, 
so  that — 

yr=yo+r  .  Ao1^-^ — ^A02-{-  .  .  .  to  1  terms  .  ...  (y) 

x  .  2 

Ar*= Ao^r  .  A0<+1+f  ^r—  ~  A0*+2-+  .  .  .  to  r-\-i  terms  .  .  (8) 

and  starting  further  down  the  scale — 

yr+s=ys+?' .  As1-}-— — ~As2-f-  .  .  .  to  r-j-i  terms,  ...(e) 

I  •  2 


where  s  is  any  integer. 


For  example,  let  y 

=^4,  and  let  the  values  of  #  be  0,  h,  2h,  3J1  .  . 

Values  of 

Values  of  y. 

Differences. 

First. 

Second. 

Third. 

Fourth.  Fifth, 

0 

0 

A4 

h 

h* 

15A4 

14/i4 

36  A4 

2h 

16  A4 

5°A4 

24 &4 

65A4 

6ohA 

O 

3J1 

81A4 

110A4 

24A4 

1 75  A4 

84A4 

O 

4h 

256A4 

369  A4 

194/j4 

108A4 

24  A4 

5^ 

625  A4 

671A4 

302  A4 

6  h 

1296 /i4 

Formula  (a)  gives  A04  =  (256 -  4x81  +  6x16-4x1  +  o)/*4 

=  24/i4,  where  r  is  taken  as  4. 
Formula  (/3)  gives  A25  =  (74  -  5  X  64  +  10  x  54  -  10  x  44  + 

5  X  34  —  24)/z4  =  0,  where  r  =  5,  i  —  2. 
Formula  (y)  gives  (5^)4  =  (0  +  5  +  10  X  14  +  10  x  36  +  5  X  24 

+  o)/^4  =  625 A4,  where  r  =  5. 
Formula  (8)  gives  A23  =  (36  +  2  x  24  -+  o)A4  =  84A4, 


where  r  =  2,  t  =  3, 


and — 

Formula  (e)  gives  (5A)4  =  (16  -+  3  x  65  -+-  3  x  110  +  84) A4 

=  625/i4,  where  r  —  3,  s  =  2. 
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2.  If  the  relation  between  y  and  x  is  of  the  form 
y  =  u0  -j-  ct^x  -f-  ci^X“  -f*  •  •  •  “h  dnxn , 
and  the  values  of  x  are  in  Arithmetic  Progression,  viz.  x0,  x0  +  A, 

•  •  •  x0  +  (n  —  x)h>  then  it  can  be  shown  that  A0n  =  an  •  hn\n, 
and  that  there  are  no  higher  differences. 

For  A  ==  Clo  —  Uo~ j~  Cl-^  [Xo~ \~h  —  Xo)~\~  .  .  .  ~\~Cln{(Xo~\~h))n — xdl } 
=ha1-\-  .  .  .  -\-an{nhxon~1Jr  lower  powers  of  x0}, 

A  11  =  ha1-j-  .  .  .  +an{»/&(#0+A)n-1+lower  powers  of  x0-\-h) 
Ao2  =  2A2^2+  •  •  •  -\-cin{n{n—  i)A2*on_2-i-lower  powers  oi  x0}. 

Thus  A©1,  A o2  contain  no  higher  powers  than  x0n~x  and  x0n~ 2 
respectively. 

Continuing  this  process — 

A0n=ann(n— i)  ...  3 . 2.  1  hn=anhnn\ . (£) 

and  A0n+1  and  higher  differences  disappear. 

In  the  example  above  where — 

y=xi,  (2/i=i,  n= 4,  A04=  1 .  A4 . 4  !  =24 A4,  and  A05  =  o. 

Conversely  if  we  assume  that  there  is  no  difference  above 
the  nth,  it  is  shown  in  the  following  note  that  the  equation 
between  y  and  x  is  of  the  form  y  =  a0  +  axx  +  .  .  .  +  anxn. 

Note. — The  relation  between  Differences  and  Derived  Functions  (or  Differ¬ 
ential  Coefficients)  is  very  important  in  the  theory  of  the  former,  and  can  be 
exhibited  concisely  by  the  method  of  operators. 

Using  the  usual  notation  of  the  calculus,  we  have  by  Taylor’s  Theorem — 

f[x+h)=f(x)+hf/(x)+^hif"(x)-{-  .  .  .  =ehT> .  f(x),  where  D  stands  for  the 

operation  of  differentiation,  and  ehI>  is  to  be  expanded  into  1 +AD  (/?2D2-{-  .  .  . 

and  then  applied  term  by  term  to  f(x).  The  use  of  D  as  an  algebraic  symbol  is 
justified  because  of  the  relationships  D{D / (#)}  ==D2/ (x) ,  Dm{Dw/(#)}=D,rt+M/(;r), 
dD(f(x))=T>(af(x))s  etc. 

Now  A f(x)  =f{x-{-h)—f(x)  =  {ehD  —  i)f(x). 

A{af(x)}  =  aAf(x),  A{Af(x)}—A2f[x),  Am(Anf(x))=Am+nf(x),  and  A  can  be 
used  as  an  algebraic  symbol. 

Hence  A=em  —  1 

A«=(e;tD_I)n=(^D+AA2D2+  .  #  .)n=h*D*{i-klhD  +  J/i2D2+  .  .  .)* 


=A«D«(i  +  ~AD+^±i)^2D2+  .  .  .) . (i) 

and  AD=log  (i+A) 

A"D*={log  (1+ A)}W=(A— ^A2-f  JA3—  .  .  .)» 

=a«(i-”a+»-(3”+5*a»+  .  .  .) . (ii) 


Now  if  f(x)—a0-\-a1x-\-  .  .  .  -\-anxn,  'Dnf{x)=an.n\,  and  DM+1/(^)  =0 
=D n+2f(x)  .  .  . 

.*.  Aw/(#)=Awa„w!,andAw+V(*)=^M+1T)n+1(I  +  . .  .)f{x)=o  from  equation  (i) 
as  in  the  text. 

Conversely  if  An+1f[x)  =o=An+2f(x)  =  .  .  .  ,  then  from  (ii)  Dn+l/[x) 
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=0  .  .  .  D»/(*)  =  const =c„,  D»-i f{x)=cnx+cn-ly  D»-*f(x) 
and  /(*)  =~cn*n-{-  .  .  .  +Cl*-fc0. 

Hence  if  the  difference  is  constant,  the  function  is  rational,  integral, 
and  of  the  wth  degree. 

Newton's  interpolation  formula,  discussed  below  («■),  can  quickly  be  obtained 
by  the  use  of  operators ;  thus — 


y=f{xo+k)=ekDf(x0)  =  {i-{-A)hf{Xo),  since  eAD  =  i+A, 
=  f{xo)  +^A/(*0)  +  -  •  I)  *  A2/(^o)  +  •  •  • 


=y0+ 


x—x, 


h 


V 


+ 


A' — x—x„—h 


h 


2  h 


A02  + 


,  where 


When  the  nih  difference  (or  the  nth  derived  function)  is 
zero,  formula  /?  shows  that 

yn+t  nyn~i+t  d — - — — — yn~2+t  •  .  *  +  yt  —  0  .  .  .  {?]) 

4-4 

for  all  values  of  t. 

3.  The  common  formula  of  interpolation  depends  on  the 
assumption  that  a  continuous  function,  y  —  f(x),  can  represent 
the  observations  in  the  neighbourhood  of  the  positions  for  which 
values  are  to  be  found. 

It  is  assumed  that  the  function  can  be  expanded  in  powers 
of  x,  as  is  generally  the  case  with  continous  functions,*  we 
may  write — 

y  —  do d- H]X  — [~ Ct<2% 2 T"  .  .  .  -f -CLnXn, . ($) 

where  n ,  the  index  of  the  highest  power  of  x,  is  still  to  be  decided. 
By  proper  choice  of  a0>  ax  ...  an  this  equation  can  be  satisfied 
by  any  (n+i)  pairs  of  values  of  (x  and  y).  Thus  for  the  straight 
line  y  =  a0-\-axx,  two  points  (or  pairs  of  values)  can  be  chosen,  for 
the  parabola  y=a0-\-a1x-{-a2x2  three  points,  and  so  on. 

The  simplest  form  is  y  =  a0  -f-  a±x,  and  the  use  of  this 
assumes  that  interpolation  by  proportional  parts  (the  method 
generally  employed  in  using  logarithmic,  trigonometric  and 
other  mathematical  tables)  is  sufficiently  accurate.  In  this 
case  the  first  difference  and  the  first  derived  function  (or 
gradient)  are  constant. 

The  parabola  takes  account  of  three  values,  and  its  use 
assumes  a  uniform  change  of  gradient,  the  second  difference 
and  the  second  derived  function  being  constant. 

The  introduction  of  further  terms  allows  for  variation  of 


*  More  exactly  for  functions  which  are  continuous,  and  whose  derived 
functions  of  all  orders  are  continuous,  and  not  infinite,  at  the  values  of  *  in 
question. 

Q* 
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higher  differences,  and  the  closing  the  expansion  at  the  nth 
term  corresponds  to  constancy  of  the  nth  difference. 

If  the  problem  is  to  interpolate  in  a  known  mathematical 
function  we  can  test  how  far  the  neglect  of  the  variation  in 
the  nth  difference  can  affect  the  calculation.  Thus  in  the 
7-figure  logarithm  table  we  have — 


Number. 

Logarithm. 

Differences. 

First. 

Second. 

Third. 

Fourth. 

Fifth. 

20 

21 

22 

23 

24 

25 

26 

27 

1. 30 1 0300 
1.3222193 
1.3424227 
1.3617278 
1.38021 12 
1.3979400 

I-4I49733 

1.4313638 

.0211893 

.0202034 

.0193051 

.0184834 

.0177288 

.0170333 

.0163905 

— .0009859 
— .0008983 
— .0008217 
—.0007546 
—.0006955 
— .0006428 

+.0000876 

+.0000766 

+.0000671 

+.0000591 

+.0000527 

— .0000110 
— .0000095 
— .0000080 
— .0000064 

+.0000015 

+.0000015 

+.0000016 

Here  the  successive  differences  diminish  regularly  and  the 
sixth  difference  is  not  greater  than  -ooooooi. 

In  applications  to  statistics  we  do  not  in  general  know 
the  function  and  we  have  to  assume  that  it  exists  and  can  be 
expanded  in  a  series  whose  convergence  is  sufficiently  rapid 
to  allow  us  to  neglect  all  terms  after,  say,  the  fifth,  or,  put 
less  accurately,  we  assume  that  the  causes  which  produce  the 
totals  have  effects  which  change  gradually  from  point  to 
point,  so  that  the  variation  of  these  changes  is  but  slight  over 
a  small  region. 

4.  Let  y0,  yi  .  .  .  yn  be  the  values  of  y  which  correspond  to 
equally  spaced  values  of  a,  viz.  x0,  -f-  h,  x0  +  zh  .  .  .  x0-\-  nh. 

Then  the  coefficients  in  equation  (0)  can  be  determined,  but 
the  arithmetic  work  is  very  arduous,  and  a  more  useful  form 
is  obtained  in  terms  of  differences. 


Consider  the  equation — 

X  —  Xok  .  X—Xo  X—Xo—hk  «,  ,  X—Xo  X—Xo—h 


y-y.+^A.1  ,  h 

X  —  Xo  —  2Jl 


2  h 


-A02-j- 


h 


2  h 


3  h 


A03-f-  .  .  .  to  w T I  terms 


(«> 


(Newton’s  formula) 

If  x=x0>  y—yo. 

If  x=x0+h,  y=y0-f  A01=y1. 

If  %=x0-\-2h,  y=yo+2A01+A02=y2. 

If  x=x0Jrrh,  y~y0-\-y  .  Ao1-}-— — ^A02-f  to  r- fi  terms,  the 

I  .  2 

subsequent  terms  vanishing,  and  therefore  by  equation  (y),  y=yr. 
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Hence  (*),  which  is  easily  seen  to  be  of  the  nth  degree,  is 
satisfied  by  the  n  pairs  of  values  in  question. 

E.g.,  to  find  y=log  20*5  from  the  table  above. 

#0=20,  h—  1,  x=x0Jr’ 5,  3/0=1*3010300,  A01='02ii893,  etc. 
y=i*30i0300+*5of  *02ii893+J(*5)(— *5)(— *ooo9859)+i(*5)(— *5) 
(—1*5)  of  •oooo876+^-(*5)(— ‘5)(— 1’5)(-2-5)(— -oooono) 
+t^o  (*5)(— *5)  (— 1*5)(— 2*5)(— 3*5)  of  -0000015. 


Using  the  first  two  terms,  we  have  y- 
„  „  three  „ 

„  „  four 

>>  five  ,,  ,, 

all  terms  ,,  ,, 

The  true  value  is  1*3117539. 


)) 


1*3116247. 

I'3II7479- 

I-3II7534- 

i*3ii7538- 

i'3ii7538. 


Applications  to  statistical  data  are  given  below,  p.  233. 

5.  Conversely,  if  we  know  y,  we  have  an  equation  for  a, 
which  can  be  solved  by  Horner’s  method  or  otherwise. 

Thus  to  determine  the  median  using  four  observations  we 
may  proceed  as  follows.  Let  there  be  y0)  yv  y2,  y3  persons 
whose  wages  are  less  than  x0,  x0  +  h,  x0  +  2h,  x0  +  3 h  units 
respectively,  and  let  there  be  (2 ym  —  1)  persons  all  together, 
so  that  the  value  of  a,  xm,  corresponding  to  ym  is  the  median. 

Then  ym—y0-\ - Ao1-} - ^ - A02 

.  OCtu  %o  OCm  X  0  h  X?n  Xo  2  fo  o 

+—h~-  - 2h  -—3  h - 

a  cubic  equation  to  determine  xm. 

• 

We  are  free,  of  course,  to  take  x0  as  the  beginning  of  any 
grade  we  please,  and  it  should  be  so  chosen  that  the  median 
is  in  the  central  grade  included  in  the  interpolation.  Thus  if 
we  use  the  cubic  equation  just  written  the  grade  x0  -f  h  to 
x0  +  2 h  should  be  that  containing  the  median. 

The  formula  on  p.  107  (2)  is  obtained  by  neglecting  the  2nd 
and  higher  differences,  and  taking  the  grade  x0  to  x0  h  to 

include  the  median.  Then  Vm  =  %  +  -  y.),  and 

therefore — 

.  ym — Vo  7 
Xm=XoJr  - —  .  Jl. 

Vi-yo 


To  find  the  mode  we  again  take  y  as  the  cumulative  number 

Q2* 
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up  to  the  value  x.  It  is  found  that  it  is  simplest  and  generally 
sufficient  to  depend  on  four  observations,  such  that  the  mode 
is  between  the  second  and  the  third.  We  then  use  the  first 
four  terms  of  equation  (k)  and  find  for  what  value  of  x  the 
curve  is  steepest  and  therefore  the  number  of  cases  per  unit 
of  the  abscissa  is  greatest.  T>xy  is  to  be  a  maximum,  and 
therefore  D x2y  zero. 


T'v  o  I  A  9  i  ^  ^  *  q 

o=Da:2y=pA02-j - ^ - Ac3. 


h 3 


Hence 


|  1.  ,  7.  I 

X-X°+h- -a73  -*-+*+  K_Wi)  +  K_%). 


where  uv  u2>  uz  are  written  for  yx— y0)  y2— yv  yz— y2  and  are  the 
number  of  cases  between  x0  and  Xo+h,  x0-\-h  and  x0-\-2 h,  and 
x0-{-2h  and  x0-\-3h  respectively.  If  the  mode  is  in  the  second 
grade  u2>ux  and  u2>it2.  The  formula  shows  how  the  interval 
Xo-\-h  to  x0-\-2h  is  to  be  divided  to  obtain  the  position  of  the 
mode  (see  p.  ioo). 

Here  the  fourth  differences  of  the  y’s,  that  is  the  third  differences 
of  the  u’ s,  are  neglected. 


6.  Central  Differences. — In  interpolation  we  generally  have 
to  depend  on  those  values  of  y  with  regard  to  which  the  region 
where  we  wish  to  ascertain  values  is  centrally  situated,  and 
formula  ( 6 )  is  in  some  respects  awkward  for  that  purpose. 
Equivalent  formulae,  which  avoid  the  want  of  symmetry,  have 
been  devised,  in  which  so-called  "  central  differences  ”  are  used. 
No  new  principle  is  involved,  for  these  formulae  are  obtain¬ 
able  by  transformation  from  ( 0 ).  The  differences  hitherto 
used  may  be  distinguished  as  “  ascending  differences.” 

A  suitable  notation  is  as  follows : — 

X-2—Xo  —  2h 

x  -  x—Xo—h 
Xo 

X^ - Xo  “f"  hr 

X2==  Xo~\~  2h 
x2'==Xo~\~  3^ 


y-2 

y- 1 

yo 

Vi 

y2 

y3 


«-* 

8-* 

h 

h 


82_1 

S2 


S3_i 
8“+* 
83  +  | 


840 

84i 


Here  ^=yx— 820=Sj— 8- 2y0+y-1 ;  W=y2— 4yx+6yo 

-4y-i+y-2>  etc- 

Let  the  value  of  x  for  which  a  value  of  y  is  to  be  found  divide 
the  interval  x0  to  xx  in  the  ratio  p  :  q,  so  that  x=x0-{-ph=x1—qh 
and  p+q=i. 
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Then  it  will  be  found  by  substitution  that  the  formula — 
y^=py1-\-qy0-\pq{  {p+i)h\+  [q-\- i)802} 

which  (by  writing  q=z—p)  is  seen  to  be  a  rational  integral  function 
of  the  5th  degree  in  p,  and  similarly  in  q,  is  satisfied  by  the  six 
pairs  of  values  {x-2y-2)  fa-iY-i)  •  •  •  (#3  ^3) )  while,  if  the  term 
involving  the  4th  differences  is  omitted,  the  four  pairs  [x~1y~1) 
•  •  •  (x2  y 2)  satisfy  it. 

As  an  illustration  of  the  notation  we  may  write  yx  —  log  23 
in  the  table  on  p.  226,  and  taking  p  =  *2  calculate  log  22-2. 

Log  22*2= *2  log  23+ *8  log  22 
*16 

— g-{i*2  of  (—-0008217)  +  i-8  of  (—*0008923)} 

i.6 X 1  2  X 1  8|2.2  q£  (_.0ooo95)-f  2*8  of  (— -ooooiio)} 

120 

=  1*3462837+  *0000694—  -0000002=  1-3463529. 

The  true  value  is  1*3463530. 


The  importance  of  the  formula  is,  however,  more  apparent 
when  we  have  no  general  algebraic  function,  but  wish  to 
interpolate  from  neighbouring  values  only. 


7.  Lagrange's  Formula. — The  formulae  (t)  (11)  (6)  (k)  and  (A) 
all  relate  to  the  case  where  the  observed  values  of  x  are  equi¬ 
distant  each  from  the  next.  There  is  no  such  simple  method 
of  interpolation  where  the  distances  are  not  equal.  An  equation 
is  given  by  Lagrange  which  is  of  the  nth  degree  and  satisfied 
the  n  +  1  pairs  of  values  (x0  y0),  (xx  y±)  .  .  .  (xn  yn)  what¬ 
ever  the  relation  between  the  x's  maybe,  and  it  maybe  written 
as  follows : — 


y=y0  (*—*l)  (*— *2)  •  •  •  )X~Xn) 


(Xo—X1){Xo—X2)  .  .  .  (Xo—Xn)~r'^1(x1  —  Xo)(x1  —  Xi)  .  .  .  (%^  —  Xn) 


(x—Xo){x—X2)  .  ■  .  (X—Xn) 


+  ...  +y 


ft 


(x—Xo){x  —  %1)  .  .  .  (x  —  Xn-i) 


(Xn  —  Xo){Xn  —  X1)  .  .  .  (Xn~ Xn-\) 


The  numerator  in  any  fraction,  say  the  multiplier  of  yt)  is 
obtained  by  multiplying  the  factors  (x  —  x0)  (x  —  x±)  .  .  . 
(x  —  xn)  omitting  x  —  xt ;  the  denominator  is  obtained  from 
the  numerator  by  writing  xt  for  x. 

It  is  evident  that  when  x  =  xt  every  fraction  is  zero  except 
the  multiplier  of  yu  which  is  unity,  and  therefore  y  —  yt. 
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8.  We  may  now  reconsider  the  assumptions  made  when  we 
took  equation  ( 0 )  to  express  the  relation  between  y  and  x. 

If  y  and  a  are  connected  by  any  functional  law,  that  is  if 
y  is  determinate  for  all  given  values  of  a,  without  which  assump¬ 
tion  most  problems  of  interpolation  are  meaningless,  then  y 
can  be  expressed  as  a  function  of  a,  say  y  =  /(*).  If  the 
function  and  its  derivatives  are  continuous  then  by  Maclaurin’s 
Theorem — 

y=/(o)+A;/1(o)+|^/2(o)+|T/3(o)+  .  .  .  continued  indefinitely. 

If  fn+1(o)  and  following  coefficients  are  very  small,  and  x 
is  never  large,  the  terms  from  the  n  +  2nd  onwards  become 
negligible  in  comparison  with  earlier  terms,  so  that  the  first 
n  +  i  terms  determine  the  value  of  y  approximately.  Now 
by  the  equations  (i)  and  (ii),  p.  224,  fn+1  is  small  when  An+1> 
An+2,  .  .  .  are  small,  and  vice  versa.  Hence  we  have  the 
following  general  statement  :  any  functional  relation  between 
y  and  a  reduces  to  the  parabolic  equation  of  the  nth  degree  (6), 
if  the  differences  of  orders  higher  than  the  nth  vanish,  and 
if  these  differences  do  not  vanish  but  are  small,  equation  ( 6 ) 
is  still  an  approximate  expression  for  the  relation. 

Now  if  the  line  drawn  through  the  given  points  is  to  have 
continuous  and  slowly  changing  curvature,  it  is  easily  verified 
that  the  second  differences  for  points  near  together  are  not 
large,  for  a  rapid  change  in  the  rate  of  increase  of  the  ordinate 
means  a  rapid  change  of  curvature;  and  if  we  construct  a 
second  curve  with  the  same  abscissae  and  the  first  differences 
as  ordinates,  small  third  differences  will  indicate  absence  of 
rapid  change  in  the  first,  and  so  on ;  but  beyond  this  point 
it  is  not  easy  to  see  the  connection  between  the  hypothesis 
underlying  interpolation  and  the  diminution  of  successive 
differences.  The  converse,  however,  is  clearer ;  if  in  any 
series  of  figures  it  is  found  experimentally  that  the  successive 
differences  tend  to  disappear,  then  any  curve  which  passes 
through  the  points  is  expressed  approximately  by  the  para¬ 
bolic  equation.  De  Morgan  states  this  conclusion  thus  : — 
“  If  we  take  n  points  near  each  other,  and  having  their  abscissae 
in  arithmetic  progression,  with  a  small  or  at  least  not  very 
large  common  difference,  and  their  ordinates  not  very  unequal 
.  .  .  the  parabola  of  the  n  —  Ith  order  will  very  nearly  coincide 
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with  any  regular  curve  of  the  same  general  appearance,  at 
least  between  the  same  points/’  Boole’s  explanation  is  : — 
“  It  is  customary  to  assume  for  the  general  expression  of  the 
values  under  consideration  a  rational  and  integral  function 
of  x,  and  to  determine  the  constants  by  the  given  conditions. 
This  assumption  rests  upon  the  supposition  (a  supposition, 
however,  actually  verified  in  the  case  of  all  tabulated  functions*) 
that  the  successive  orders  of  differences  rapidly  diminish.” 

Since,  from  equation  (i),  p.  224,  when  h  is  small,  the  in¬ 
fluences  of  the  successive  differences  for  any  curve  are  smaller 
as  their  order  becomes  higher,  it  is  a  legitimate  process  to 
build  up  a  series  of  values  of  any  function  on  the  hypothesis 
that  the  higher  differences  vanish. 

If  a  freehand  curve  is  drawn  so  as  to  pass  through  the 
chosen  fixed  points,  and  to  have  curvature  which  changes  as 
slowly  as  possible,  a  line  will  be  obtained  which  lies  very  near 
that  given  by  equation  ( 6 ).  Such  a  line  would  be  similar  to 
the  track  of  a  bicyclist  who  was  riding  so  as  to  pass  over  several 
marks,  or  just  to  avoid  several  obstacles. 

9.  It  is  clear  from  the  above  analysis  that  we  can  make  a 
smooth  continuous  curve  pass  through  any  number  of  points 
we  please ;  for  with  the  parabolic  equation  (0)  there  are  never 

any  sudden  jumps  in  the  values  of  y,  ~  or  as  x  changes 

continuously;  and  we  can  obtain  as  many  linear  equations 
(which  have  always  real  values)  as  there  are  constants,  simply  by 
taking  n  in  the  original  equation  to  be  the  number  of  fixed  points. 

If  we  have,  let  us  say,  10  points,  as — 


*  That  is  mathematical  functions  such  as 
approximations. 


not  statistical 
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and  wish  to  find  a  point  on  a  fixed  vertical  line  between  F  and 
G,  we  can  either  take  only  F  and  G  into  consideration,  and, 
joining  them  by  a  straight  line,  obtain  the  point  x1;  or  con¬ 
sidering  E,  F,  and  G,  or  F,  G,  and  H,  draw  parabolas  and  obtain 
*2  or  x3 ;  or  considering  E,  F,  G,  and  H,  draw  a  parabola  of  the 
third  order,  which  would  have  a  point  of  inflexion  near  f; 
this  would  be  approximately  the  path  a  bicyclist  might  follow 
if  he  had  to  start  from  E,  and  ride  to  a  near  point  H,  passing 
close  to  F  and  G.  If  we  now  include  d  and  K  (if  our  bicyclist 
has  to  start  from  D,  pass  E,  F,  G,  and  H,  and  reach  k)  we  shall 
modify  the  curvature  throughout ;  and  as  we  include  more 
and  more  points  shall  continue  to  affect  slightly  the  path  F  G. 
If  the  inclusion  of  the  nearer  points  tends  to  make  the  line  F  G 
approximate  more  and  more  closely  to  a  final  position,  while 
the  further  inclusion  of  the  more  distant  points  throws  it 
further  away,  we  may  conclude  that  the  positions  of  these 
further  points  are  not  governed  by  the  same  numerical  con¬ 
ditions  as  the  nearer  one.  Thus  in  a  “  table  of  survivals  ” 
the  figures  for  ages  under  5  years  are  not  distributed  in 
accordance  with  the  curve  determined  by  the  figures  for  higher 
ages ;  in  a  table  showing  wages,  it  may  be  seen  that  those  of 
highly  paid  workmen  are  not  governed  by  the  same  causes  as 
those  lower  in  the  scale.  On  the  other  hand,  the  number  in 
each  census  is  dependent  on  all  the  previous  numbers  for  more 
than  one  generation.  In  interpolating  for  the  population  of 
1876  we  shall  obtain  different  figures  according  as  we  include 
1851,  ’6i,  '71,  ’8i,  ’91  only,  or  1901  as  well;  and  this  is  not 
surprising,  for  a  mistake  made  in  1876  may  not  come  to  light 
till  we  have  watched  the  growth  of  the  population  for  twenty- 
five  years.  It  is  clear  that  the  points  far  from  the  period  in 
which  the  interpolation  is  to  be  done  cannot  be  allowed  so 
much  influence  as  those  nearer,  and  it  appears  experimentally 
that  this  condition  is  fulfilled  in  the  method  discussed;  also, 
in  series  (k)  the  successive  coefficients  begin  to  diminish  with 
the  rth  term  where  x  <  x0  {2 r  —  3 )h,  that  is  with  the  co¬ 
efficient  of  the  first  difference  when  *  is  between  x0  and  a:0  +  h. 
It  may  be  noticed  that  the  wanderings  of  the  curve  are  limited 
by  the  condition  that  a  curve  of  the  n  —  Ith  order  cannot  have 

more  than  n  —  3  points  of  inflexion,  for  has  no  term  of  a 

higher  degree  than  xn~*. 
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In  the  above  illustration  the  intermediate  points  from  F  to 
G  might  be  found  from  the  five  points  d,  e,  f,  g,  h,  or  from 

e,  f,  g,  h,  k.  These  two  curves  may  be  welded  together 
between  F  and  G.  The  points  near  f  are  more  accurately 
determined  by  the  first,  of  which  it  is  the  middle ;  those  near 
G  by  the  second.  The  welding  line  should  touch  the  first  at 

f,  the  second  at  G.  This  is  conveniently  done  by  the  use  of 
the  sine  curve.  This  method  is  employed,  I  believe,  at  the 
Registrar-General's  office. 

It  cannot  be  said  that  the  present  theory  of  statistical 
interpolation  rests  on  an  altogether  satisfactory  basis.*  The 
principles  which  govern  it  are  not  well  defined,  and  the 
mathematical  analysis  of  the  methods,  by  which  the  principles 
should  be  brought  into  relation  with  the  facts,  is  incomplete. 
Yet  it  is  perhaps  unnecessary  to  labour  after  more  refined 
methods,  for  interpolation  cannot  be  precise  unless  we  actually 
know  the  algebraic  expression  of  the  laws  which  govern  the 
figures,  and  the  method  here  discussed  is  found  to  satisfy  the 
conditions  empirically,  while  further  refinements  could  only 
introduce  slight  modifications. 

10.  Examples  showing  the  Numerical  Use  of  the  Formula. — 
(1)  Given  the  number  of  wage-earners  earning  sums  in  5s. 
groups,  to  estimate  the  number  earning  as  much  as  24s.  and 
not  so  much  as  25s. 


*  Numbers 

Differences. 

per  1,000 
Wage- 
Earners 

(Adultmales) 

1st. 

2nd. 

3rd. 

4th. 

1 

/S'SS- 

"o  20s. 

39 

296 

257 

46 

Earning  as  much 

•2 

303 

-  144 

5  25s. 

599 

-98 

151 

0 

804 

205 

7 

18 

as  1  os. 

Z  3°s. 

-91 

0 

918 

114 

25 

°35s- 

48 

-66 

\-o  40s. 

9  66 

*  General  Report  on  Wages,  (C — 6889;  year  1893). 


*  This  remark  does  not  apply  to  the  interpolation  in  evaluating  mathe¬ 
matical  functions. 
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Neglect  the  increasing  differences  arising  from  the  number 
earning  less  than  15s. 

Using  formula  (k),  x0—20  (shillings),  h— 5,  y0= 296,  A01=3O3, 
A02=— 98,  A03=7,  A04=i8. 

At  25s.,  y=599,  from  above  table. 


At  24s.,  *=24,  ^=296+^  of  3°3+f  -  — - 

5  5 

4  —1  —6  ,  .  4  —1 

5  10  15  7  ^5 


10 

-6 

15 


of  (-98)  + 


11 


20 


of  18. 


10  15  '  ■  5  10 

=296+242-4+7-84+'224+ -3168=547  (nearly). 

The  required  number  is  therefore  599—547=52. 

Again  at  23s.,  x=Xo-j-3,  y= 489,  and  the  number  earning  as 
much  as  23s.  and  not  so  much  as  24s.  is  58. 


(2)  To  make  an  estimate  for  the  value  of  imports  in  the 
year  1813,  the  records  for  which  were  destroyed  by  fire. 


Given  value  of  imports  in — 


1810  - 

-  £39,202,000  - 

-  yv 

1811  - 

-  26,510,000  - 

-  y2- 

1812  - 

-  26,163,000  - 

-  ys- 

1813  - 

•  •  • 

-  y4- 

1814  - 

-  33,755,000  - 

-  y5- 

1815  - 

-  32,987,000  - 

-  y6- 

1816  - 

-  27,431,000  - 

-  yv 

From  formulae  (rj),  using  y3  and  y5  only,  and  assuming  that 
2nd  differences  vanish, 

y*—29>959- 

From  formulae  (rj),  using  y2  and  y6  as  well,  and  assuming  that 
4th  differences  vanish, 

yG+y2-4(y5+y3)+6y4=°>  y4=3o,o29. 

From  formulae  (rj),  using  yx  and  y7  as  well,  and  assuming  that 
6th  differences  vanish, 

y7+yi-6(y6+y2)+i5(y5+y3)-2oy4-o,  y4=3o,42i. 

Here  the  first  and  second  values  are  very  near  together, 
while  the  third  differs;  hence  we  adopt  £30,000,000  as  the 
value  required. 

(3)  In  Mr.  Booth’s  Life  and  Labour  of  the  People ,  e.g., 
Vol.  V,  p.  46,  a  series  of  very  useful  diagrams  is  given  showing 
the  age  distribution  of  various  classes.  The  figures  he  uses 
are  as  follows  : — 
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Ages. 


Proportion 
occupied  per 
10,000  of  total 
aged  10-80. 


Average  at 
each  year  of 
age  between 
given  limits. 


10-15  years  - 

- 

- 

-  193*5 

387 

15-20 

yy 

r 

-■ 

-  880 

176 

20-25 

y  y 

- 

- 

-  933 

i88*6 

25-35 

yy 

- 

- 

-  1636 

163*6 

35-45 

y  y 

- 

- 

-  1201 

120*1 

45-55 

y  y 

- 

- 

-  830 

83 

55-65 

y  y 

- 

- 

-  434 

43*4 

65-80 

yy 

- 

- 

-  i92*5 

12*8 

His  diagram  is  drawn  from  the  last  column,  the  numbers  in 
which  form  the  ordinates  for  the  middle  of  the  corresponding 
age  periods.  The  points  so  obtained  are  joined  by  straight 
lines.  This  method  is  sufficiently  accurate  for  his  purpose, 
but  it  will  afford  an  interesting  example  of  interpolation  if  we 
obtain  some  of  the  figures  for  intermediate  years  more  closely. 


Age. 

15=% 

20 =x2 
25= *3 
35= *4 
45=% 
55=% 

65=% 
80  —  #8 


Proportion  occupied 
per  10,000  under 
x  years. 

-  I93*5=yi 

-  I073‘5— y2 

-  2006-5 =y3 

-  3642-5=74 

-  4843’5=y5 

-  5673’5=y6 

-  6107-5=7, 

-  6300  =y8 


Use  Lagrange’s  formula  (ft)  to  determine  the  number  under 
30  years,  ignoring  persons  over  55.  Thus  %  =  30. 


y=  193-5  X 


10  •  5( — 5)( — x5) ( — 25) 


( — 5 )  ( — I0)  ( — 2°)  ( — 3°)  ( — 4°) 

1  i5-5(~5)(— 15)(-25) 

+  73  5X5(-5)(-i5)(-35)(-35) 

+2oo6-5xi5-i°(-yhI5l(r25l 

10 . 5(— io)(-2o)(-3o) 

+3642-5  x  r5  • 10  •  5(— 15)(— 25) 


+4843-5  x 


15  •  10  •  5 (~5)(-25) 

30 . 25 . 20 . 10 (—10) 


20  .  15  .  I0(—  IO)  (  —  20) 


+5673^5  X 


15  •  10  ■  5 ( — 5) ( — 15) 
40 . 35 . 30 . 20 . 10 


=  2879. 
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Mr.  Booth’s  diagram  gives  2824*5  for  the  same  position, 
using  y3  +  y4  only. 

If  in  the  formula  the  quantities  y2,yz,y4,y$  only  are  used, 
y  is  found  to  be  2869. 

Lagrange’s  formula  as  used  above  is  equivalent  to  the 
assumption  that  the  6th  differences  vanish  when  the  ages  are 
uniformly  graded.  Write  a ,  b,  c  for  the  values  of  y  at  30,  40 
and  50  years. 

c  Using  formula  (/?)  or  (rj)  for  the  values  yv  y2,  y3,  ci,  y4,  b, 
y5|we  have  y,  —  6 y2  +  isy3  —  20 a  +  I5y4  —  66  +  y5  =  0,  and 
similarly 

y2-6y3+i5a-2oy4+i5b-6y5+c=o 
and  y3— 6«+i5y4— 2°&+i5y5— 6c+y6=o. 

Whence  by  straightforward  solution  a  =  2879  as  above. 
This  method,  when  applicable,  is  simpler  than  Lagrange’s 
formula. 

(4)  As  an  example  of  the  determination  of  the  median 
and  the  mode,  we  will  use  the  figures  already  employed  on 
p.  69,  which  may  be  retabulated  thus  : — 


Earning 
less  than 

X. 

y • 

Differences. 

$•25 

•75 

1.25 

1- 75 

2.25 

2- 75 

—  I 

0 

1 

2 

3 

4 

0 

317 

1789 

3086 

4056 

4562 

317 

1472 

1297 

970 

506 

1157 

-175 

-327 

-464 

-1332 

-152 

-137 

The  whole  number  of  persons  is  5123. 

To  find  the  median 

put  y= 2562,  and  use  the  entries  from  x=o  to  #=4. 

Then  2562=317+1472#— J  of  i75#(#—  i)—  J of  i52#(#— 1)(#— 2) 

+  2V  of  I5^(^—I) (^—2) (#— 3)> 

if  we  stop  at  the  4th  difference. 

61488=7608+36122#— m#2— 698#3+i5#4,  and  the  solution 
by  Homer’s  method  is  #=1*5715. 

Hence  the  median  is  at  $.75  +  1*5715  of  *50=11.536. 

Another  method  is  to  suppose  x  expressed  as  a  function  of  y* 
and  to  write  Lagrange’s  formula — 

(y-y x)  (y-ya)  (y-y»)  „  ,  ,  , 

<y«-yi)(yo-yi){yo-ys)  + 


* 


Cf.  Edgeworth  in  the  Statistical  Journal,  1898,  p.  698. 
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If  we  use  four  entries  only  in  the  above  table,  we  have — 


(2562-1789)  (2562-3086)  (2562-4056) 
— 1472  x  —  2769  X  -  3739 
whence  1*5624  and  the  median  is  $1,531. 


of  0  +  +  +, 


This  method  is  suitable  for  working  on  a  calculating 
machine. 

To  find  the  mode  use  the  entries  from  x  —  —  1  to  a  =  2. 

The  second  and  third  differences  in  the  formula  of  p.  228 
are  now  1157  and  —  1332. 

The  required  value  is  $.75  +  — ^  of  -50  =  $1.18. 

^  3  3  2 

Variations  of  method  can  be  used,  leading  to  slightly 
different  results.  The  mode  is,  in  fact,  not  precisely  determinate 
when  the  grading  is  so  wide  and  the  higher  differences  do  not 
tend  to  zero. 

This  method  is  applicable  to  such  problems  as  the  deter¬ 
mination  of  the  date  at  which  the  population,  the  marriage, 
birth,  and  death  rates,  etc.,  increased  most  rapidly;  at  what 
age  the  chance  of  death  increases  most,  etc.* 

11.  An  important  group  of  problems  of  interpolation  arise 
when  the  original  returns  have  to  be  corrected,  e.g.,  the  deter¬ 
mination  of  the  distribution  by  age  from  the  census  returns. 

We  have  now  the  problem  of  drawing  a  smooth  line  in  the 
neighbourhood  of  a  great  number  of  points,  but  not  necessarily 
through  any  of  them.  The  assumption  is  that  the  returns  are 
insufficient  in  number  or  deficient  in  accuracy,  and  that  they 
indicate  a  regular  distribution  which  it  is  required  to  represent. 

(1)  One  method  is  to  assume  that  the  averages  over  fairly 
large  groups  are  accurate,  and  to  these  averages  to  apply  any 
of  the  methods  already  discussed. 

(2)  A  second  method  has  been  used  in  the  section  in  which 
various  curves  were  smoothed  (vide  supra,  Chapter  VII).  This 
may  be  restated  as  follows  : — Take  successive  groups  of  2,  or 
3,  or  4  ....  10  points,  beginning  again  and  again  at  the 
ordinates  for  each  of  the  given  abscissae.  Find  the  centres  of 
gravity  of  each  group ;  that  is,  erect  an  ordinate  equal  to  the 
average  of  the  ordinates  of  a  group  at  the  point  half-way 
between  the  ends  of  the  abscissae  of  the  outside  ordinates  of  the 
group.  Draw  a  line  through  the  points  so  obtained.  It  will 

*  Cf.  Edgeworth,  in  Statistical  Journal ,  1899,  p.  381,  and  the  references 
there  given. 
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be  found  that  this  line  satisfies  all  the  conditions  laid  down.  An 
example  of  this  method  is  given  in  the  diagram  facing  p.  134. 

(3)  In  another  method  the  original  figures  are  smoothed 
till  the  differences  of  the  fourth  or  fifth  or  higher  orders  vanish ; 
and  then  the  ordinary  lormulae  of  interpolation  are  applied. 
Thus  in  example  1,  on  p.  233,  rewrite  the  table  thus  : — 


Wages 
above  15s. 

Smoothed 

Numbers. 

Corrected  Differences. 

Up  to  20s. 

»  25s. 

„  3°s- 

5  J  35S. 

,,  4°s* 

296 

599  +  a 

804  4-  a  +  b 

918 

966 

1st. 

3°3  +  <* 

205  +  b 

1 14  -  a  -  b 
48 

2nd. 

-  98  -  a  +  b 

-  91  -  a  -  2b 
-66  +  a  +  b 

3rd. 

7-3* 

25  +  2  a  +  3^ 

If  we  put  b  —  2$,  a  =  —  16,  the  third  differences  vanish, 
and  we  have  Ao1  =  2 87,  A 02  —  ~  79b  A03  =  A04  =*0;  when 
x  =  25,  y  =  583,  and  when 

*  =  24 ,  y  =  296  +  i  of  287  -  J-s  of  (-  79 1)  =  531-97 
so  that  the  number  earning  as  much  as  24s.  and  not  so  much  as 
25s.  is  now  found  to  be  51,  instead  of  52. 

The  corrections  may  be  applied  to  any  of  the  original  figures. 

We  need  to  solve  only  one  more  equation  to  complete  our 
table  from  20s.  to  30s. 

When  x  —  23,  y  =  296  +  -f  of  287  +  °f  79b  The 
difference  between  this  and  the  value  of  y,  when  a  =  24,  is 

i  of  287  —  At  of  79§  =  54'21- 

We  have  therefore  the  following  table,  where  the  figures 
in  italics  have  already  been  calculated,  while  the  others  are 
added  on  the  assumption  that  the  third  differences  are  zero. 


Wages. 

Numbers. 

Differences. 

1st. 

2nd. 

3rd. 

Up  to  20s. 

,,  21s. 

a  22S. 

„  23S. 

„  24S. 

„  25S. 

„  26s. 

,,  27s. 

„  28s. 

„  29S. 

2q6 

360 

420 

478 

532 

583 

631 

676 

717 

755 

6375 

60-57 

57*39 

5  4' 21 
52-03 

47*85 

44-67 

4i*49 

38-31 

O  C  *t  Q 

•  •  • 

3-18 

3*i8 

3-18 

3*i8 

3-18 

3-18 

3-18 

3*18 

3-i8 

•  •  • 

0 

0 

0 

0 

0 

0 

0 

0 

3°s. 

790 

J  J  J 

•  •  • 
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If  we  had  taken  the  second  differences  more  exactly,  we 
should  have  obtained  804  +  a  +  &  =  79°3  for  the  last  figure 
as  in  the  previous  table. 

This  method  of  writing  down  many  figures  when  the  signifi¬ 
cant  differences  have  been  found  can  be  very  generally  applied 
also  in  the  cases  where  the  data  are  exact. 

(4)  Another  method,  involving  higher  mathematics,  would 
be  discussed  more  suitably  after  the  section  devoted  to  the 
law  of  error;  a  brief  explanation  with  a  useful  formula  may, 
however,  be  offered  here. 

Suppose  we  have  five  consecutive  points  (—  2,  y_2), 
(—  I,  y-i),  ( 0 ,  y),  (I,  y,),  (2,  y2)  given. 

A  parabola  of  the  fourth  order  could  be  drawn  through  these 
five  points,  but  would  have  two  points  of  inflexion.  A  great 
number  of  parabolas  of  the  third  order  can  be  drawn  near  all 
the  points,  having  no  points  of  inflexion,  and  satisfying  all  the 
ordinary  conditions  of  interpolation. 

Borrowing  a  principle  from  the  method  of  least  squares,* 
we  assume  that  if  the  coefficients  of  the  parabola 

y  =  a  +  bx  -f-  cx2  -f  dx3 
are  chosen  so  as  to  make  the  quantity 

2(0  +  bx  +  cx2  +  dx3  —  y)2 

(where  the  summation  extends  over  the  five  years  of  values  of 
x  and  y)  a  minimum,  the  parabola  so  determined  will  be  the 
best  for  the  purpose. 

For  the  necessary  mathematical  analysis,  Professor  Darwin’s 
paper  On  Fallible  Measures ,f  from  which  this  method  is  taken, 
should  be  consulted. 

The  following  equation  is  obtained — 
a  —  y0  —  -iz  x  Ac4,  where  A04  is  the  difference  of  the  fourth 
order  for  the  y’s. 

Now  replace  the  point  ( 0 ,  y)  by  the  intersection  of  its 
ordinate  with  the  parabola,  that  is  by  ( 0 ,  a),  where  a  has  the 
value  just  given,  that  is,  diminish  y  by  the  quantity  A04. 

Repeat  the  same  process  for  each  point  on  the  original  line, 
taking  it  as  the  middle  of  a  group  of  5,  and  a  smooth  curve 
lying  very  near  all  the  original  points  is  obtained. 

Thus  we  may  smooth  line  C  in  diagram  facing  p.  146. 


*  See  Part  II,  Appendix,  Note  10. 
f  See  Phil.  Mag.  and  Journal ,  July  1877, 
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Imported 
Wheat  per  head 
of  the 

Population. 

Differences. 

Smoothed  Figures. 

lbs. 

1890  226 

1891  244 

1892  245 

1893  248 

1894  256 

1895  285 

1896  257 

1897  228 

1898  238 

18 

1 

3 

8 

29 

-28 

-29 
-f- 10 

-17 

2 

5 

2 1 

-57 
—  1 

39 

19 

3 

16 

-78 

56 

40 

— 16 
13 
-94 
134 
— 16 

245+A  of  16=246^ 
248-/5  of  i3=247 

256  +  /rof  94=264 
285-/5  of  !34=263i 
257+/t  16=2581 

j 

The  statistics  of  wheat  consumption  are  inexact  because 
of  the  variation  of  the  stocks  at  the  end  of  each  year,  of  which 
no  record  was  available.  Hence  it  is  reasonable  to  regard  the 
numbers  as  subject  to  amendment  and  smooth  off  irregularities. 

(5)  A  more  general  problem  of  interpolation  is  to  find  an 
algebraic  formula,  other  than  the  parabolic  equation  so  far 
used,  which  expresses  a  whole  series  or  group.  A  short  intro¬ 
duction  to  such  formula  will  be  found  in  Part  II,  Chap.  V, 
below. 


Note. — Formula  (a)  is  due  to  Professor  Everett,  who  gave  the  general 
term  and  proof  ( Quarterly  Journal  of  Pure  and  Applied  Mathematics ,  No.  128, 
1901,  formula  G).  A  proof  can  be  obtained  as  follows  : — 

If  /(#)  =cosh^2<?  sinh-d^j,  it  is  readily  shown  that  fn+2{o )  =  (?2— Jw2)/n(o), 

and  thence  by  Maclaurin’s  Theorem  the  expansion  of  f{x)  is — 

l-\-^q2X2  +  -^q2(q2—i)xi-\-^q2(q2  —  i2)(q2  —  22)x6-\-  .  .  .  =  cosh^2 q  sinh*1^ 

After  differentiating  and  dividing  by  qx  we  obtain — 


—  i)*2+^i?(22-i2)(?2— ^ 22)x*+  .  .  .  sinh^  sinh-1^) 

sinh  (qhD)  4  ,  .  / AD\ 

=  sinh  (AD)  >  where  *=2  smH^  / 

/  AD  AD\2 

In  the  notation  of  p.  228,  502  =  (ehD  —  2-\-e-KD)y0=\eT— e~~z)  y0)  so  that 
the  operator  5=2  sin 


since  p  +  q  = 


yPh=ePhT){y0)  and  y1=fiAD  (y0). 
ePho~)  identically,  {e(p+ uAr> — ^(p-i)ad|^_ (^ad — e~hT>) 
={ezhD — e  ~  vhD  +  (&pad — e~phI>)ehI>  }-f-  {eh° — e  ~ hD) , 
_  sinh  [qhD)  sinh  (p AD) 

—  sinh  (AD)  *  sinh  (AD)  e  * 

.  _  _  sinh  {q  AD)  sinh  (£  AD) 

'  ‘  Vph~  sinh(AD)  ‘  sinhT(ADy 
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x  in  the  above  series  is  identified  as  5,  and  we  have,  using  the  series  first 
as  expressing  operators  on  y0,  and  secondly  (after  p  is  written  for  q)  as 
expressing  operators  on  y1} 

(?2  - 1 )  5»2  +  jl?  (?s  - 1 ')  (?2  -  2  2)  8„* 

+A?(?S— 1!)(?2— 22)(?2— 3S)V  +  •  •  • 

+ pyi + jtf  iP2 - 1  )*i*  ■ +  tP2 - 1 1 ’)  (P- -  *')  h' 

+f<P(P2-i2)U>2-*2)(P2-32W+  ■  ■  : 

that  is  formula  (A.)  generalized. 

For  further  information  on  the  subject  of  interpolation,  the  reader  is 
referred  to  Dr.  Farr’s  Life  Table  (No.  3),  1864,  Boole’s  Finite  Differences, 
Text-Book  of  Institute  of  Actuaries,  Part  II.,  p.  420  seq.,  Rice’s  Theory  and 
Practice  of  Interpolation,  1899,  Merrifield  On  Quadratures  and  Interpolation 
(British  Association  Report,  1880),  Chauvenet’s  Spherical  and  Practical 
Astronomy  (Chap.  II.),  Woolhouse  in  the  Assurance  Magazine  (Vols.  XI., 
XII.),  Professor  J.  D.  Everett  On  the  Algebra  of  Difference  Tables  (Quarterly 
Journal  of  Mathematics,  No.  124,  1900),  On  a  Central-difference  Interpola¬ 
tion  Formula  (British  Association  Report,  1900),  and  in  the  Journal  of  the 
Institute  of  Actuaries,  January  1901,  and  Dr.  W.  F.  Sheppard’s  Papers  On 
Central  Difference  Formulce  (Proceedings  of  the  London  Mathematical  Society, 
Vol.  XXXI.,  Nos.  707-710),  and  On  the  Use  of  Auxiliary  Curves  in  Statistics 
of  Continuous  Variation  (Statistical  Journal,  September  1900).  In  these 
other  references  will  be  found. 
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PART  II. 


APPLICATIONS  OF  MATHEMATICS  TO 

STATISTICS. 


CHAPTER  I. 

INTRODUCTORY.  FREQUENCY  CURVES. 

Introductory. 

Mathematical  processes  are  essential  in  very  many  parts 
of  the  statistical  field,  and  in  the  first  part  of  this  book 
algebraic  methods  have  been  used  for  the  generalisation  of 
arithmetical  results  and  for  the  simpler  cases  of  interpolation. 
There  are,  however,  many  classes  of  problems  which  necessitate 
mathematical  treatment  of  a  rather  special  nature,  and  it  is 
to  the  consideration  of  some  of  these  that  this  second  part  is 
devoted.  The  whole  field  is  too  wide  to  cover,  and  selection 
has  been  made  of  those  methods  which  are  fundamental  and 
of  those  problems  which  are  of  direct  interest  to  students  of 
political  economy  and  allied  sciences.  Essentially  the  same 
methods  are  needed  for  statistical  problems  in  medicine, 
biology  and  other  sciences,  and  their  use  can  be  followed  in 
the  appropriate  journals.  Here  it  has  seemed  best  to  keep, 
as  a  general  principle,  to  those  questions  which  have  arisen 
in  connection  with  economic  and  social  investigation,  and  to 
take  examples  mainly  from  this  limited  region. 

So  tar  as  the  manifold  and  diverse  applications  can  be 
classified,  they  fall  into  three  groups  :  (i)  the  systematic 

description  of  groups,  (2)  the  measurement  of  relationship 
between  phenomena,  (3)  the  measurement  of  the  precision  of 
results  obtained  by  a  process  of  sampling.  The  background 
of  the  great  part  of  the  relevant  analysis  is  the  theory  of  chance, 
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carried  to  a  point  which  is  reached  only  by  the  relatively  small 
number  of  mathematicians  who  have  specialised  in  that 
subject.  Since  it  cannot  be  assumed  that  readers  are  familiar 
with  any  but  the  simpler  cases  of  algebraic  probability  *  and 
there  are  no  familiar  text-books  in  English  to  which  reference 
can  be  made,  it  has  been  necessary  to  devote  a  good  deal  of 
space  to  purely  mathematical  treatment ;  but  an  effort  has 
been  made  to  render  the  treatment  intelligible  to  those  who 
have  had  some  mathematical  training,  but  are  not  specialists 
in  the  subject.  Thus  where  possible  the  proofs  have  been 
given  without  the  use  of  the  Infinitesimal  Calculus ;  the 
results  have  been  stated  as  clearly  as  possible  in  words  and 
illustrated  by  arithmetical  examples  ;  the  simplest  cases  have 
been  dealt  with  first  to  elucidate  the  processes  and  results, 
while  the  more  general  treatment  has  been  given  in  outline 
with  reference  to  papers  or  journals  where  a  complete  analysis 
has  been  found.  Non-mathematical  readers  are  recommended 
to  omit  the  parts  printed  in  small  type.  In  the  Appendix 
are  collected  some  theorems  whose  proofs  are  not  elsewhere 
very  easily  accessible,  and  to  it  are  relegated  some  parts  of 
the  analysis  which  are  too  unwieldy  for  the  text. 


Frequency  Groups  and  Curves. 


The  remainder  of  this  chapter  is  devoted  to  the  systematic 
measurement  of  frequency  groups. 


P3 


P, 


yi 


y2 


M,  M2  M3 


X 


Let  there  be  any  group  of  measurements  such  that,  an  axis  Ox 
being  taken  on  which  a  scale  is  marked,  instances  are  found  to 
have  the  measurement  xv  y2  the  measurement  x2,  and  so  on ;  then 
the  group  can  be  represented  as  in  the  diagram,  where  OMj  =  xlf 
M1P1=y1,  etc. 


* 


For  elementary  treatment,  see  Whitworth’s  Choice  and  Chance. 
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It  is  not  necessary  that  the  grades  M1M2>  M2M3  .  .  .  should 
be  equal. 

Let  n  be  the  whole  number  in  the  group,  so  that 

n=yx+  y2  +  .  .  . 

Then  the  “frequencies  "  of  observations  at  xv  x2,  etc.,  are 


y_i  yj 

n’  n 


etc. 


If  the  points  Px,  P2,  P3  .  .  .  can  be  regarded  as  lying  on  a 
continuous  curve,  then  their  locus  is  a  “frequency  curve.” 

If  the  measurements  do  not  fall  into  grades,  or  in  sub-groups 
at  particular  values,  but  each  observation  has  a  distinctive 
measurement,  then  the  group  can  be  represented  by  a  loaded  axis 
on  which  each  item  is  marked  by  a  dot, 


and  a  great  part  of  the  following  formulae  is  applicable  to  such 
a  loaded  line  as  well  as  to  a  frequency  curve. 

Measurements  of  the  members  of  a  group  are  frequently  massed 
in  grades  (as  20-25,  25-30,  .  .  .  years)  or  originally  made  to  the 
nearest  unit  (as  55-56,  56-57  .  .  .  inches).  In  such  cases  the 
number  in  each  grade  is  approximately  represented  by  a  rectangle 
(as  M2M3R2Q2  on  the  grade  M2M3). 


vy2 

R,  > 

~2 

9\ 

O  M|  M2  M3  X 


Let  h  be  the  breadth  of  each  grade,  xv  x2  .  .  .  the  abscissae  of 
their  middle  points,  yv  y2  .  .  .  the  altitudes  of  the  rectangles, 
and  y-Ji ,  y2h  .  .  .  the  numbers  recorded  in  the  grades.  Then 

n  =  yxh  +  y^  +  •  •  •  >  =  y%  +  JV2  • 

if  h  is  taken  as  the  unit. 

The  frequencies  in  the  grades  are 

yjl  yj* 
n  ’  n 


.  .  etc. 
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If  a  continuous  curve  can  be  defined  and  constructed  so  that 
the  parts  of  its  area  standing  on  M1M2,  M2M3  .  .  .  are  proportional 
to  y-Ji,  y2h  .  .  . ,  then  this  is  the  frequency  curve  of  the  group. 

Variation  is  a  general  law  of  nature  and  is  found  in  most 
human  affairs,  so  that  large  scale  observations  usually  lead  to 
frequency  groups.  Four  classes  can  be  distinguished  :  (a)  where 
every  member  of  a  group  has  been  measured,  e.g.  the 
wages  of  every  adult  male  working  in  a  trade  ;  ( b )  observa¬ 
tions  of  samples  selected  from  a  group,  e.g.  the  number  of 
children  in  each  of  1,000  families  chosen  in  a  town  where  there 
are  50,000  families,  or  the  measurement  of  leaves  of  a  tree  of 
a  particular  kind  ;  (c)  repeated  measurements  of  a  physical 
quantity  (e.g.  of  the  declination  of  a  star)  where  the  variations 
are  due  to  instrumental  errors  ;  (d)  the  mathematical  proba¬ 
bilities  of  various  numbers  of  successes  (e.g.  the  chances  of 
obtaining  1,  2,  3  .  .  .  heads  when  50  coins  are  tossed)  or  the 
frequencies  of  events  whose  magnitude  depends  on  an  unknown 
complex  of  causes. 

To  whichever  class  the  phenomena  belong,  the  same  general 
method  of  describing  the  group  is  appropriate.  This  method 
is  to  select  certain  algebraic  functions  of  the  Vs  and  y  s  and 
to  evaluate  them  for  the  particular  group.  The  group  is  in 
fact  described  (1)  by  determining  a  central  position,  (2)  by 
measuring  the  dispersion  of  the  observations  from  this  centre, 

(3)  by  measuring  any  want  of  symmetry  about  its  centre, 

(4)  by  further  measurements  depending  on  the  shape  of  the 
diagram  which  represents  the  group. 

For  the  central  position  we  can  use  the  arithmetic  average, 
the  median,  the  mode  or,  in  some  cases,  the  geometric  mean. 
The  arithmetic  average  is  necessary  in  most  cases  in  further 
calculations  and  must  be  taken  as  the  usual  starting  point. 
The  median  does  not  lend  itself  readily  to  general  algebraic 
work,  is  not  always  known  precisely,  and  need  only  be  calcu¬ 
lated  for  special  purposes.  The  mode  is  not  generally  deter¬ 
minable  exactly  from  the  observations  and  the  introduction 
of  approximation  at  the  beginning  of  the  calculations  should 
be  avoided ;  if,  however,  we  have  a  definite  algebraic  formula 
for  the  group,  the  mode  can  be  exactly  obtained  and  is  often 
important.  (Part  I,  Chapter  V). 

For  measurement  of  dispersion  we  may  use  the  “  probable 
error,”  i.e.,  the  half-interquartile  range,  or  the  mean  deviation, 
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or  the  deviation  of  mean  square.  Of  these  the  probable  error, 
like  the  median,  can  often  only  be  found  approximately  and 
is  difficult  to  use  systematically  in  further  measurements. 
The  mean  deviation,  apart  from  an  ambiguity  in  the  position 
of  the  origin  from  which  the  deviations  are  to  be  measured, 
introduces  in  further  work  a  serious  difficulty  because  the 
first  measurements  are  taken  irrespective  of  their  sign.  The 
deviation  of  mean  square  on  the  other  hand  is  free  from  all 
these  difficulties,  being  defined  uniquely  as  the  square  root 
of  the  average  of  the  squares  of  the  deviations  of  single  measure¬ 
ments  from  their  average,  and  not  only  is  easy  to  handle 
algebraically,  but  also  necessarily  enters  into  many  calcula¬ 
tions.  It  is  called  the  standard  deviation  and  is  universally 
used  in  mathematical  statistics.  (Part  I,  Chapter  VI.) 

Want  of  symmetry  in  a  curve  is  indicated  by  the  want  of 
coincidence  of  the  median,  mode  and  arithmetic  average,  and 
by  inequality  of  the  distances  from  the  median  to  the  lower 
and  upper  quartiles.  On  any  such  quantities,  which  are  zero 
when  the  group  is  symmetrical,  a  measurement  can  be  based ; 
but  the  median,  mode  and  quartiles  can  often  only  be  found 
approximately,  a  resulting  measurement  is  specially  subject 
to  any  imperfections  resulting  from  paucity  of  observations, 
and  a  change  in  magnitude  of  an  observation  has  no  influence 
if  it  does  not  transfer  it  across  the  median  or  a  quartile. 

We  need  a  measurement  which  is  sensitive  to  the  position 
of  every  observation.  It  would  be  possible  to  take  the  differ¬ 
ence  between  the  mean  deviations  of  observations  above  and 
observations  below  the  average,  but  this  would  not  lead  to 
a  formula  readily  put  in  line  with  other  systematic  measure¬ 
ments.  It  is  found  that  the  deviation  of  mean  cube  (the 
average  of  the  third  powers  of  the  deviations  of  observations 
from  their  average,  taken  positively  or  negatively  as  they 
occur)  is  free  from  all  difficulties,  and  it  is  evidently  sensitive 
to  all  want  of  symmetry  or  “  skewness.” 

In  measuring  deviation  it  is  natural  and  usual  to  express 
the  result  in  concrete  terms  as  so  many  inches,  lbs.,  or  other 
units,  and  the  standard  deviation,  probable  error,  and  mean 
deviation  are  so  expressed.  But  in  measuring  skewness  there 
is  no  obvious  concrete  unit  and  it  is  convenient  to  construct 
the  measurement  so  as  to  be  independent  of  the  unit  used ; 
this  is  obtained  by  expressing  the  deviation  of  each  observa- 
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tion  from  the  average  as  a  multiple  of  the  standard  deviation  ; 
thus  if  x  is  a  measurement,  x  the  average,  and  a  the  standard 

deviation,  the  quantities  averaged  are  (- 


(x  —  x\s 


and  the 


resulting  measurement  of  skewness  is 


i 

n 


sum  of  all  values 


This  evidently  gives  a  sensitive  measurement, 


but  on  no  obvious  scale,  and  it  is  only  by  experience  of  the 
shapes  of  curves  and  the  resulting  measures  of  skewness  that 
these  measures  acquire  an  intelligible  meaning. 

Further  measurements  can  be  obtained  from  the  mean 
fourth,  fifth,  and  higher  powers.  These  have  been  generalised 
by  Professor  Karl  Pearson  in  his  system  of  moments.  The 
ist,  2nd,  3rd  . .  .  moments  are  the  mean  of  the  first,  second,  third 
.  .  .  powers  of  the  deviations ;  the  deviations  may  be  measured 
from  any  point  and  the  resulting  moments  are  with  respect 
to  that  point ;  but  the  arithmetic  average  is  generally  taken 
as  the  centre  from  which  measurements  are  made,  and  moments 
with  regard  to  other  points  are  only  used  to  facilitate  calculation. 

In  Part  I,  Chapter  V,  it  was  explained  that  an  average 
was  used  as  a  compact  way  of  describing  a  group,  especially 
when  it  was  desired  to  compare  or  contrast  two  groups.  This 
conception  has  now  been  developed,  and  we  have  a  systematic 
way  of  describing  the  essential  characteristics  by  three  or  more 
symbols,  which  measure  the  average,  the  standard  deviation, 
the  skewness  and  further  analogous  quantities.  As  soon  as 
the  meanings  and  scales  of  these  measurements  are  appreciated, 
we  may  dispense  with  the  original  data  (keeping  them  only 
for  reference  or  as  diagrams),  express  groups  in  a  concentrated 
form,  and  base  calculations  showing  the  relations  of  groups'  to 
each  other  on  these  quantities  which  are  specially  adapted  to 
mathematical  treatment. 

The  system  is  not  of  universal  applicability,  and  in 
Chapter  V  are  given  examples  of  other  methods  suitable  for 
particular  classes  of  groups. 
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Notation  of  Moments. 

The  notation  and  nomenclature  used  here  are  as  follows  : — 


m't  =  ~  (x\y,  +  x\y 2  +  ...)  =  S  (xly)  +  n.  .  .  (1) 

is  called  the  tth  moment  of  the  group  about  its  origin. 

n  =  Sy . (2) 

mf  —  x  =  S xy  tS y . (3) 

is  the  average  of  the  group. 


mt  =  S  (x  —  x)ly  ~  n . (4) 

is  the  tth  moment  about  the  average. 

Then 

nm2  =  S  (x  —  x)2y  =  S x2y  —  2xSxy  +  x2S y  =  nmf  —  2x .  nx  -f-  nx% 


and 

.  .  m2  =  m2  x2 . (5) 

cr  =  V m2 . (6) 

is  called  the  standard  deviation,  as  defined  above. 

nmz  =  S  x3y  —  ^xSx2y  +  ^x2Sxy  —  nx9 
m3  =  mf  —  3  5cm  f  +  2x3 . (7) 


m3  is  zero  in  a  symmetrical  curve.  To  obtain  a  convenient 
measurement  of  want  of  symmetry  or  skewness  the  abscissae  are 
expressed  as  multiples  of  the  standard  deviation,  thus  eliminating 
the  concrete  unit  of  measurement. 

Thus  K*  =  s{(^^')3y}  -h  n  =  ^3 . (8) 

is  a  measurement  of  skewness. 

Similarly,  m4  =  mf  —  4  xmf  +  6  x2mf  —  3.F4  ....  (9) 

is  the  fourth  moment,  and 


gives  a  measurement  independent  of  the  unit. 


*  This  symbol  is  introduced  in  this  book  in  place  of  letters  formerly  used 
to  measure  skewness.  It  is  believed  that  it  will  be  found  convenient. 
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The  standard  deviation  being  given,  the  more  the  members 
of  the  group  are  dispersed  from  the  centre,  the  greater  is  k2. 
In  the  particular  case  of  the  normal  curve  of  error  (p.  269), 
/c 2  =  3.  If  without  altering  a  the  central  height  is  depressed 
and  the  outlying  parts  pushed  further  out  than  in  the  normal 
curve,  then  /c2>3- 


m. 


Professor  Karl  Pearson  uses  /3±  —  so  that  v^i  =  K  as 


given  above,  and  he  and  Mr.  Yule  use  a  more  elaborate  formula 
for  skewness.  Also  he  writes  fit  instead  of  mt,  and  /3a  for  k2. 
Professor  Edgeworth,  following  earlier  practice,  frequently 
uses  c  =  V2ni2  (called  the  modulus),  for  the  unit  of  reduction 


instead  of  a,  so  that  c  =  aV 2.  On  the  whole  the  saving  of 
complexity  in  some  formulae  by  the  use  of  c  may  be  held  not  to 
compensate  the  use  of  an  additional  letter,  for  in  any  case  the 
standard  deviation  must  be  used. 


Edgeworth  also  uses  j  for 


% 
c3  ’ 


so  that  re  —  2  V2 j,  and  i  for 


mA 

c4 


3 

4 


Then  i  is  zero  in  the  normal 


curve  of  error. 


Illustrations  oj  the  Calculation  of  Moments. 

In  the  following  examples  methods  of  calculating  the 
essential  measurements  x,  a,  k,  k2  are  given. 

In  very  few  cases  has  it  been  found  necessary  or  expedient 
to  use  higher  moments  than  the  fourth  for  descriptive  work, 
and  it  is  well  that  this  is  so,  for  the  errors  incident  to  the 
obtaining  of  higher  moments  from  actual  observations  are 
generally  so  considerable  as  to  render  them  useless. 

1.  In  the  first  example  a  fairly  homogeneous  group  of 
physical  measurements  is  taken,  viz.,  the  weights  of  3,404  boys 
of  nearly  the  same  age.  If  their  heights  (given  on  p.  385)  were 
symmetrically  distributed,  it  is  to  be  expected  that  their  weights 
would  show  a  positive  skewness,  and  in  fact  k  =  *643.  One 
boy  of  exceptional  physique  (height  5  ft.  4  in.,  weight  14  stones) 
is  excluded  in  the  calculation  of  moments.  The  curve  is  not 
far  removed  from  normality,  for  a:2  —  3  equals  only  *457- 
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Weight  of  Boys,  14  to  15  Years  of  Age,  Granted  Employment 

Certificate  in  New  York. 


Weight. 

Scale. 

Number. 

Products. 

lbs. 

X 

y 

xy 

xzy 

xsy 

x^y 

65- 

-  7 

3 

— 

21 

147 

— 

1,029 

7,203 

70- 

-  6 

9 

— 

54 

324 

— 

U944 

11,664 

75- 

-  5 

142 

— 

710 

3,550 

— 

17,750 

88,750 

80- 

—  4 

301 

— 

1,204 

4,816 

— 

19,264 

77,056 

85- 

-  3 

289 

— 

867 

2,601 

— 

7,803 

23,409 

90- 

—  2 

380 

— 

760 

1,520 

— 

3,040 

6,080 

95- 

—  1 

416 

— 

416 

416 

— 

416 

416 

100- 

0 

4°4 

— 

O 

— 

0 

105- 

I 

315 

+ 

315 

315 

+ 

315 

315 

no- 

2 

320 

+ 

640 

1,280 

+ 

2,560 

5,120 

1 15- 

3 

262 

+ 

786 

2,358 

+ 

7,074 

21,222 

120- 

4 

221 

+ 

884 

3,536 

+ 

14,144 

56,576 

125- 

5 

131 

+ 

655 

3,275 

+ 

16,375 

81,875 

130- 

6 

76 

+ 

456 

2,736 

+ 

16,416 

98,496 

135- 

7 

52 

+ 

364 

2,548 

+ 

17,836 

124,852 

140- 

8 

20 

+ 

160 

1,280 

+ 

10,240 

81,920 

145- 

9 

29 

+ 

261 

2,349 

+ 

21,141 

190,269 

150- 

10 

14 

+ 

140 

1,400 

+ 

14,000 

140,000 

155- 

11 

10 

+ 

no 

1,210 

+ 

13,310 

146,410 

160- 

12 

2 

+ 

24 

288 

+ 

3,456 

4U472 

165- 

13 

2 

+ 

26 

338 

+ 

4,394 

57,J22 

170- 

14 

5 

+ 

70 

980 

+ 

13,720 

192,080 

175- 

15 

1 

+ 

15 

225 

+ 

3,375 

50,625 

3,4°4 

+  4»9ob 

37,492 

+  158,356 

1,502,932 

4,032 

— 

51,246 

+ 

874 

+  107,110 

The  origin  is  taken  at  102-5,  and  the  unit  as  5  lbs. 


,  -  874 

---  -2568 

3404 

37492 

=  11-014 

3404 

...  /  107110 

W3  — 

=  3I-466 

3404 

„„  ,  _  1502932 

Wl  A  - 

=  44i‘5i9 

3404 

m1=  o.  Average  102-5+  *2568  X  5=  103-784  lbs. 

m2  =  m2'  —  x%=  10-948 

m3—  ms'—  3  xm2  -{•  2x*  =  23-01 

w4  =  m/—  4 #w8'+  6 I'm/  -  3V4—  413-542 


wa  corrected,*  =  10-948—  *TV=  10-865.  tr=  3-296,  *.£.  16-48  lbs. 

w4  corrected,*  =  w4  —  ^ma  +  ^  =  413-542—  5-474  +  -029=  408-10 


A/yj  < _  4i| 

«= -a= -643=^1*  K^wl\=  3-457=0* 
m 

c  =  4-661  j  =  -£=  -227  t=  -114. 

C 


*  Sheppard's  corrections,  see  Appendix,  Note  5,  p.  439. 

In  the  above  table  and  in  similar  calculations  it  is  assumed  that  the 
numbers  in  each  grade  can  be  treated  as  if  they  were  all  at  the  centre  of  the 
grade.  Unless  the  grading  is  very  fine,  this  exaggerates  perceptibly  the 
second  and  fourth  moments,  while  if  the  numbers  in  the  extreme  grades  are 
small  the  first  and  third  are  little  affected.  If  the  breadth  of  the  grade  is  h 
and  not  taken  as  unity,  the  corrected  moments  are  m,  —  T\A*  and 
m4  — 
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2.  Sauerbeck’s  45  index  numbers  measure  the  movement  of 
prices  of  separate  commodities,  while  their  average  measures 
the  general  price  movement.  The  45  numbers  may  be  regarded 
as  measurements  of  the  general  movement  subject  to  indi¬ 
vidual  chance  deviations,  and  therefore  form  a  frequency 
group,  whose  standard  deviation  can  be  used  to  measure  the 
precision  of  the  average.  The  group  is  moderately  unsym- 
metrical.  The  number  of  cases  is  so  small  that  it  is  not 
worth  while  to  calculate  the  4th  moment. 


Sauerbeck’s  Index  Numbers  of  45  Commodities  in  1916. 


Num¬ 

bers. 

X 

-V* 

X 3 

Num¬ 

bers. 

X 

.t'2 

X* 

68 

— 

68 

4,624 

-  314,432 

138 

+ 

2 

4 

8 

7i 

— 

65 

4.225 

—  274,625 

148 

4* 

12 

144 

1,728 

84 

— 

52 

2,704 

—  140,608 

148 

+ 

12 

144 

1,728 

86 

— 

50 

2,500 

—  125,000 

153 

+ 

17 

289 

4,913 

93 

— 

43 

1,849 

79,507 

154 

+ 

18 

324 

5,832 

96 

— 

40 

1,600 

—  64,000 

154 

+ 

18 

324 

5,832 

100 

— 

36 

1,296 

—  46,656 

157 

+ 

21 

441 

9,261 

100 

— 

36 

1,296 

—  46,656 

159 

4* 

23 

529 

12,167 

IOI 

— 

35 

1,225 

42,875 

159 

+ 

23 

529 

12,167 

104 

— 

32 

1,024 

32,768 

I60 

4- 

24 

576 

13,824 

104 

— 

32 

1,024 

-  32,768 

l6l 

+ 

25 

625 

15,625 

107 

— 

29 

841 

-  24,389 

163 

+ 

27 

729 

19,683 

114 

— 

22 

484 

—  10,648 

163 

4“ 

27 

729 

19,683 

114 

— 

22 

484 

—  10,648 

166 

4~ 

30 

900 

27,000 

119 

— 

17 

289 

-  4,9i3 

168 

4- 

32 

1,024 

32,768 

121 

— 

15 

225 

~  3,375 

169 

4“ 

33 

1,089 

35.937 

125 

— 

11 

1 21 

-  L33I 

I72 

4* 

36 

1,296 

46,656 

128 

— 

8 

64 

-  512 

173 

+ 

37 

1,369 

50,653 

128 

— 

8 

64 

-  512 

174 

4- 

38 

!,444 

54,872 

131 

— 

5 

25 

-  125 

183 

+ 

47 

2,209 

103,823 

132 

— 

4 

l6 

-  64 

197 

+ 

61 

3,721 

226,981 

135 

— 

1 

I 

—  1 

202 

+ 

66 

4,356 

287,496 

TQt; 

I 

T 

T 

1 JJ 

22 

629 

22,795 

988,637 

632 

25,982 

23 

— 

-  1,256,414 

23 

— 

632 

25,982 

—  1,256,414 

45 

— 

3 

48,777 

-  267,777 

Origin  at  136. 


*  =  -  "  •  Average  136  —  ~-=  135-93 

45  45 


A  1  *1  *7 

mz'=  — ~  IO®3'933  tn.2—m2'—x2  —  1083*929.  (r—Vm2  —  32-9 


*»«'  = 


45 
267777 


5951  >«S=  Ms  -  3™«2'  +  2X3  =  -  5734 


45 
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3.  Observations  of  the  Right  Ascension  of  the  Pole  Star.* 


Seconds 
from  assumed 
mean. 

X 

Number  of 
Observations.  ' 

y 

xy 

x%y 

xPy 

Xiy 

+  3-o 

6 

I 

6 

36 

216 

1,296 

+  2-5 

5 

5 

25 

125 

625 

3A25 

+  2-0 

4 

16 

64 

256 

1,024 

4,096 

+  i-5 

3 

38 

114 

342 

1,026 

3,078 

+  1-0 

2 

63 

126 

252 

504 

1,008 

+  0*5 

1 

72 

72 

72 

72 

72 

0-0 

0 

82 

— 

0 

— 

O 

-o-5 

—  1 

73 

-  73 

73 

-  73 

73 

—  1-0 

—  2 

61 

— 122 

244 

-  488 

976 

-i-5 

-3 

36 

— 108 

324 

-  972 

2,916 

—  2-0 

-4 

21 

-  84 

336 

-U344 

5,376 

-2-5 

-5 

12 

—  60 

3°° 

—  1,500 

7,5oo 

-3*0 

-6 

6 

-  36 

216 

—  1,296 

7,776 

-3-5 

-7 

I 

-  7 

49 

-  343 

2,401 

487  +407  2,625  3.467  39,693 

—  490  —6,016 


-  83  -2,549 

x  =  —  -170 

=  5*390  —  -029  =5-361.  <r  =  2-3 

m3  =  -  5-234-  3  (—  -170)  X  5-390+  2(—  -i7o)3  =  -  2-49  k  =  —•  2 

mi  =  81-505— 4  (—-170)  (—  5'234)  +  6(*i7o)2(5-39o)-3(-i7o)4=  78-88  k2=2-7 

These  observations  have  been  frequently  used  in  discussing 
how  far  physical  observations  can  be  expressed  by  the  normal 
curve.  The  results  are  nearly  symmetrical,  but  since  k2  <  3 
there  is  an  over-concentration  near  the  average. 


4.  The  following  example  shows  how  a  table  of  chances  can 
be  treated  as  a  frequency  group  ;  an  unsymmetrical  case  has 
been  selected,  namely  the  chance  of  obtaining  sixes  in  a  throw 
of  12  dice  ;  e.g.,  the  chance  of  exactly  3  sixes  is 


Number  of 
Sixes. 

x 

O 

1 

2 

3  • 

4  • 

5  • 

6  . 

7  • 

8  . 

9  • 

10 

11 

12 


see  p.  262. 


Chance  in  12  throws. 


y 


244,140,625  4- 

585,937,500 

644,531,250 

612 

99 

9  9 

429,687,500 

9  9 

X  —  2 

193,359,375 

9  9 

II 

rt 

61,875,000 

9  9 

m3  =  l}j 

14,437,500 

9  9 

m4  =  8ti 

2,475,000 

9  9 

(T  =  1-29 

309,375 

9  9 

«■  =  -516 

27,500 

1,650 

60 

9  9 

9  9 

9  t 

x%  =  3"1 

.  1 


2,176,782,336 


*  Quetelet,  Lettres  sur  la  tMorie  des  probability,  p.  128. 
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5.  If  digits  are  selected  at  random  their  average  may  be 
expected  to  tend  to  4-5.  The  group  in  the  table  below  shows 
the  result  of  selecting  400  groups  of  25  each  from  the  last 
digits  in  7  figure  logarithm  tables.  The  group  is  somewhat 
unsymmetrical  and  k2>  3. 


Sum  of  25  Digits,  divided  by  5. 


Difference 
from  22*5. 

Over  9 
8  to  9 

7  8 

6  ,,  7 
5  M  6 
4  >>  5 
3  »  4 
2  „  3 
1  „  2 
o  „  1 
o  „  -1 
-1  „  -2 
-2  „  -3 
-3  »»  “4 
-4  »»  —5 
-5  „  —6 
-6  „  —7 

-7  “ 8 


Number  of 
Cases. 

1 

5 

9 

5 

12 
10 
15 
36 
48 

57 
62 

58 
39 
17 

13 
10 

2 
1 


With  origin  23, 

x  =  —-2575  ;  average,  22*7425 
wa  =  8*8662  ;  corrected,  8*783 
<r  =  2*964 

m8  =  13*584  ;  k  =  *522 
m4  =  274*24  ;  corrected,  269*8 
"a  =  3-50 


400 


Mr.  Elderton  *  gives  a  method  of  calculating  moments  spe¬ 
cially  suited  for  work  on  an  adding  and  multiplying  machine, 
which  may  be  expressed  as  follows  in  the  notation  of  this 
chapter. 

Let  yv  y2  .  .  .  yt  be  the  frequencies  at  x  —  1,  2  .  .  .  t. 

Write  oSi  =  yt,  0S2  =  yt  +  yt- 1,  •  .  •  os«  =  yt  +  yt  -1  +  *  *  .  +  ^i- 

Also  write 


1^2  —  0^1  4"  0^2>  1^3  —  oSx  4"  0^2  4*  0^3 »  •  •  •  > 
and  2^2  ==  T*  iS2,  .  •  ==:  4"  jS2  4*  • 

0S*  =  number  of  observations  =  n 

iS<  =  tyt  4-  [t  —  +  . .  .  4- 1  .y\  =  nx, 

where  x  is  the  average,  =  «%'. 


0Si  -f-  0S2  4*  •  •  •  0 
.  +  ]S t,  and  so  on. 


*  Frequency  Curves  and  Correlation,  pp.  19-23.  On  p.  23  Mr.  Elderton  shows 
how  to  use  an  origin  near  the  centre,  thereby  saving  numerical  work.  See 
also  Hardy,  The  Theory  oj  the  Construction  of  Tables  of  Mortality }  pp.  59  seq. 
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2St  —  (1+2+. .  •  -M)<y«  +  (I+2+. .  i)y«_i+.  •  •+(i+2)>'2~hyi 

1  — 1)2  ,  ,  1 . 2  n,  , 

=  — ^ — ye  H — - — yt- 1  +  •  .  •  +  —  =  -  K  +  %'), 

where  ?%',  m2  .  .  .  are  moments  about  the  origin. 

3$* =  -f1  • 2  +  2 . 3  +  •  •  -t  {t  + 1)  }yt  +  i  {1 . 2  + . . .  (t  —  1 1  /}jy«_x  + . . . 
=  \{t  (f  + 1)  (*  +  2  )yt  +  (t  —  1)  t  (t  + 1  )yt_1  +  ...  +  i.2.3y1) 

—  tzim3+  inh'  +  2#h'),  and 


4Sf  =  —  (w4'  +  6w3'  +  11  m2  +  6ml). 

Then  by  the  use  of  equations  5,  7,  and  9  we  find 

m2  —  ~  .  2S(  —  i:  (1  +[#) 

w3  =  -  .  3S^  3W2  (x  ~f" ,^)  T  (t  -j-  -X')  (2  ~J~  x) 

71 

=  —  .4Se— 2w3(3+2^)— w2(ii  +  i8.t+6.t2)— .t(i+^)  (2+*)  (3+*). 
yi 

The  quantities  XS t,  2St,  3S«,  4S*  are  quickly  obtained  by 
repeated  addition.  The  process  is  exhibited  sufficiently  by 
working  out  the  moments  of  Example  5  (p.  256)  by  this  method. 

x  is  measured  from  the  origin  14  ;  yx  is  the  number  of  cases 
at  x.  Each  term  in  the  column  0SX  is  obtained  by  adding  the 


terms  in  the  previous  column  that  stand  to  the  left  and  above 


it  ;  the 

column 

jSs  is  obtained 

similarly 

from  the 

column 

0SX  and  so  on. 

t 

=  18. 

Write  19 

—  %  —  x'. 

Sum  of  digits 

5 

X 

Jx 

s  1 

(T* 

S  ' 

1°X 

S  ' 

2Sx 

S  ' 

Over  31-/5 

18 

I 

I 

I 

I 

1 

30*5 

17 

5 

6 

7 

8 

9 

29-5 

16 

9 

15 

22 

30 

39 

28-5 

15 

5 

20 

42 

72 

hi 

2  7*5 

14 

12 

32 

74 

146 

257 

26-5 

13 

10 

42 

116 

262 

519 

25-5 

12 

15 

57 

173 

435 

954 

24*5 

II 

36 

93 

266 

701 

1,655 

23*5 

IO 

48 

141 

407 

1,108 

2,763 

22-5 

9 

57 

198 

605 

1,713 

4,476 

21*5 

8 

62 

260 

865 

2,578 

7,054 

20-5 

7 

58 

318 

1,183 

3,76i 

10,815 

19*5 

6 

39 

357 

1,540 

5,3oi 

16,116 

18-5 

5 

17 

374 

1,914 

7,215 

23,331 

17-5 

4 

13 

387 

2,301 

9,516 

32,847 

16*5 

3 

10 

397 

2,698 

12,214 

45,o6i 

15-5 

2 

2 

399 

3,097 

I5,3H 

60,372 

I4‘5 

1 

1 

400 

3,497 

18,808 

79,180 

Totals 

400 

3,497 

18,808 

79,180 

285,560 
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X  — 


Mo  = 


3497 

400 

2 


=  8-7425  Average  =  22-7425 


m„  = 


m,  = 


400 

6 

400 

24 

400 


X  18808—  8-7425  X  9-7425=  8-8662 

X  79180—3  X  8-8662  X  9-7425—  8-7425  x  9-7425  X  10-7425=  13-584 
X  285560—2  X  13-584  X  20-485—  8-8662  X  626-95  —  10744-15  =  274-24 


CHAPTER  II. 


ALGEBRAIC  PROBABILITY  AND  THE  NORMAL  CURVE 

OF  ERROR. 

Elementary  Principles. 

The  method  and  fundamental  theorems  of  algebraic  proba¬ 
bility  may  be  summarised  as  follows  : — 

Suppose  that  there  are  N  alternative  events,  any  one  of 
which  is  just  as  likely  to  take  place  as  any  other,  and  that 
one  of  them  is  known  to  have  taken  place,  but  we  are  in 
complete  ignorance  which  ;  further,  of  the  N  events  suppose 
that  M  have  a  special  characteristic  and  the  remaining  (N  —  M) 
have  not ;  then  the  chance  that  the  event  that  has  happened 

M 

has  this  characteristic  is  defined  as  ^  . 

Thus,  if  one  card  has  been  drawn  from  an  ordinary  pack  oi 
52,  the  chance  that  it  is  a  heart  is  Tf  =  J.  Here  each  of  the 
52  events  is  so  far  as  we  know  equally  likely,  and  the  skill  of 
the  card  manufacturer  is  directed  to  make  the  cards  of  equal 
weight  and  with  equal  friction.  We  cannot  point  to  any  circum¬ 
stance  which  tends  to  give  one  card  rather  than  another,  unless 
the  surface  friction  of  an  ace  is  less  than  that  of  a  king.  In 
an  ideal  system  there  is  nothing  to  distinguish  the  circum¬ 
stances  that  lead  to  one  of  the  N  events  rather  than  another. 
In  the  apparatus  of  fair  games  of  chance  this  equality  is 
definitely  aimed  at,  and  consequently  such  games  supply 
illustrations  of  algebraic  probability. 

M  t  N-M  .  ,  , 

Let  p  —  ;  q=  1  —  p  =  — g — .  q  is  the  chance  that  the 

characteristic  will  not  be  found.  If  we  call  the  appearance  of  the 
characteristic  a  "success/'  p  is  the  chance  of  success,  q  is  the 
chance  of  failure  ;  the  odds  in  favour  are  p  to  q,  against  q  to  p. 
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Multiplication  oj  Chances. 


If  pv  p2  are  the  chances  of  success  in  two  independent  experi¬ 
ments,  then  x  Pi  can  be  shown  as  follows  to  be  the  chance  of  a 
double  success. 

In  one  experiment  let  there  be  nx  equally  likely  alternative 


ffl  771 

events,  and  in  the  other  n2.  Write  px  =  — ,  p2  =  — . 

n i  n2 

By  independence  here  we  mean  that  the  result  of  the  first 
experiment  has  no  effect  on  the  second  experiment,  so  that  each  of 
the  nx  X  n2  possible  double  events  is  equally  likely. 


Of  these  nx  X  n2  events  m1  X  m2  give  a  double  success 

mi  X  in2  ~  mz)  give  success  and  failure 
(nx  —  m-P  X  m2  give  failure  and  success 

(%  —  m-P  X  (n2  —  mp  give  double  failure. 


Of  n1n2  equally  likely  events  m1m2  give  a  double  success  and 
the  remainder  do  not.  Hence  p  the  chance  of  double  success 


m1m2 

u±n2 


—  Pi  X  p 2« 


E.g.  the  chance  that  two  sixes  will  be  thrown  by  a  pair  of  dice 
is  1  X  ^ 


If,  however,  the  experiments  are  not  independent,  but  the 
result  of  the  first  affects  the  chances  in  the  second,  the  formula 
must  be  modified  in  the  way  illustrated  by  the  following 
example. 

If  a  card  is  drawn  from  each  of  two  packs  the  chance 
of  drawing  two  aces  is  x  -£2,  where  px  =  =  p2. 

But  if  the  second  card  is  drawn  from  a  pack  from  which 
the  first  has  already  been  taken,  we  have  the  following 
alternatives : — 

There  are  52  x  51  possible  events. 

If  an  ace  is  drawn  first,  there  are  3  aces  in  the  remaining  51. 

4x3  ways  give  a  double  success.  4  x  48  give  success 
and  failure ;  48  x  4  give  failure  and  success,  and  48  x  47  give 
double  failure. 

The  chance  of  a  double  success  is  therefore  x  g-3r  = 

This  problem  may  also  be  worked  out  as  follows.  There 

are  52C2  =  ^  pairs  in  the  pack.  Of  these  4C2  = 


1.2 


are 


ALGEBRAIC  PROBABILITY  AND  THE  NORMAL  CURVE  OF  ERROR  26l 

two  aces.  Any  pair  is  as  likely  to  be  drawn  as  any  other. 
Hence  the  chance  of  drawing  two  aces,  whether  together  or 

consecutively,  is  -  . 

62  C2  52.51 

The  chance  of  obtaining  8  hearts  and  5  cards  of  other  suits 
in  a  hand  of  13  cards  dealt  from  52  is 
13 C8  x  39Q  =  13  •  12 . 11 . 10 . 9 . 8 . 7 . 6 . 39 . 38 . 37 . 36 . 35  (13 !) 
52^13  52. 51. 50. 49. 48. 47. 46. 45. 44.43. 42. 41. 40  (8!)  (5!) 

90,716,222,800  857  PP  -  V  • 

for  there  are  52C13  equally  likely  hands  =  N ;  there  are  13C8 
equally  likely  groups  of  8  hearts  and  39C5  equally  likely  groups 

M 

of  5  from  other  suits,  and  M  =  13C8  X39C5,  where  p  = 

Addition  of  Chances. 

The  total  9  can  be  obtained  from  the  throw  of  two  dice 
from  either  of  the  pairs  (3,  6)  (4,  5)  (5,  4)  (6,  3)  ;  that  is  of 
36  equally  probable  events  4  give  the  result,  and  the  chance  is 
therefore 

This  result  may  also  be  obtained  thus :  the  chance  of  throwing 
3  is  of  throwing  6  is  and  therefore  the  chance  of  throwing 
3  and  6  is  Similarly  the  chance  of  throwing  (4,  5)  (5,  4) 

and  (6,  3)  is  in  each  case.  The  whole  chance  is  the  sum 
of  the  chances  of  these  alternative  double  events. 

Generally  if  a  success  can  be  obtained  either  from  an 
occurrence  whose  chance  is  px  followed  by  one  whose  chance 
is  pf ,  or  from  successive  occurrences  whose  chances  are 
P2,  pf  ...  y  then  the  whole  chance  of  a  success  is 

P  =  p±pf  -f  p^fif  +  .  .  .  . 

Deduction  of  the  Normal  Law  of  Error. 

We  can  now  proceed  to  a  general  theorem  of  great  im¬ 
portance  alike  in  the  theory  of  probability  itself  and  in  its 
application  to  statistics. 

Suppose  an  experiment  (e.g.  throwing  dice,  drawing  a  card, 
or  choosing  a  number)  to  be  such  that  the  chance  of  success  is 
always  p  and  of  failure  q,  so  that  p  +  q  =  1. 

Let  the  experiment  be  repeated  n  times,  and  consider  the 
chance  of  obtaining  r  successes  and  n-r  failures.  The  chance 
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in  an  order  assigned  thus — the  first  r  experiments  successes  and 
the  rest  failures,  is 

p  X  p  X  to  r  factors  X  q  X  q  X  to  n-r  factors  =pr  x  qn~r ; 

and  the  chance  in  any  other  assigned  order  is  the  same.  The 
order  may  be  assigned  by  choosing  any  r  positions  for 
successes  in  a  series  of  n  experiments,  i.e.  in  nCr  ways.  Hence' 
the  whole  chance  is  nCr  •  prqn~r. 

The  chances  of  o,  1,  2  .  .  .  n  successes  are  therefore  the 
successive  terms  of  the  binomial  expansion 

1  =  ^qJrp)n=qnjrn^qn^ipjr  m  .  .  +nC r.qn~rpr-\-  .  .  .  +nqpn~1+pn 
For  example,  if  p  =  f ,  q  =  f  and  n  =  10  we  have 


r 

c 

n^r 

prqti-r 

C  P)rnn~~r 
nr  •  1 '  7 

0 

I 

3104-510 

•006,046,617,6 

1 

IO 

2X39  „ 

•040,310,784,0 

2 

45 

22X  38  „ 

•120,932,352,0 

3 

120 

23X  37  „ 

•214,990,848,0 

4 

210 

24X36 

•250,822,656,0 

5 

252 

25X35  m 

•200,658,124,8 

6 

210 

26X  34  „ 

•111,476,736,0 

7 

120 

27X  33  „ 

•042,467,328,0 

8 

45 

28X  32  „ 

•010,616,832,0 

9 

10 

29X  31  „ 

•001,572,864,0 

10 

1 

9P 

♦000,104,857,6 

1-000,000,000,0 


The  Vertical  scale  is  expanded  ioo  fold  so^that  the  area  of  the  figure 

is  100  squares  on  unit  base. 
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The  diagram  illustrates  the  relative  chances  ot  different 
numbers  of  successes,  and  exhibits  them  as  a  frequency  group. 

We  will  first  find  the  moments  of  the  group  for  general 
values  of  ft  and  n.  Take  the  horizontal  scale  on  the  diagram 
as  the  scale  for 

Suppose  the  w-fold  experiment  repeated  N  times,  where  N  is  a 
very  large  number.  Then  the  number  of  times  r  successes  are 
obtained  tends  to  be  N  X  nCr  .  qn~rftr  =yr>  say, 

and  y0  +yx  +  .  •  •  =  N (q  +  ft)n  =  N,  since  ft  +  q  =  r, 

x  —  the  first  moment  about  the  origin, 

=  (y0  x  o  +y1  x  1  T  •  •  •  +>  x^  +  ...+ynxw)q-N 
=  n  .  qn~1ft-\-n(n—i)l2  .  qn~2ft2X 2+  . . .  +nC rqn~rftrXr-\- . . .  -\-ftnXn 
=  nft  (q  +  ft)n~x  =  nft . (n) 

m2  =  CVo  x  °2  +Ti  X  l2  +  . . .  +yr  X  r2  +  . . .  -\-yn  X  n2)  -f-  N 

=  nCr .  <t'rpr  =  2>  (r  -  1)  +  r]  -  j»"^r 
0  7 1 

=  n(n-  1 )  _2a)~i  VT1  + 

=  w(«  —  i)ft2(q  -\~ft)n~2  +  nft(q  + ft )n-1  =  w(w  —  i)ft2  +  nft 

=  n2ft2  +  nft  (1  —  ft)  —  x2  -f  nftq . (12) 

and  m2,  the  second  moment  about  the  average, 

=  m2  —  x2  =  nftq  =  nft{  1  —  ft) . (13) 

In  a  similar  way 

m3  rs.nCr.qn~rftr  =  n(n—i)(n~2)ft3-lr2>n{n—1)ft2-^nft-  (T4) 

0 

and  m3,  the  third  moment  about  the  average, 

=  m3  —  3M2x  +  2x3 

=  n(n—i)  {n—2)ftz-\-3n{n—z)ft2-\'nft  —  3 nzftz—3n2ft2  (1  —ft)-\-2 n3ftz 

^nft(2ft2  —  3ft  +  1)  =  nft(i—  ft)(i  —  2ft)  =  nftq  (q  —  ft)  .  .  (15) 

» 

m4'  =2^iri.  nCr .  qn~rftr,  and  w4  can  be  shown  to  equal 
0 

3  {ftqn)2  +  ftqn(i  —  6  ftq). 
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_ _  (q  — 

Hence,  using  the  formulae  of  pp.  251-2,  o-  =  V pqn,  ft  =  pqn  ’ 

0  .  1  —  6  pq  /— —  Q  —  p  '  1  —  bpq 

2  °  '  Pqn  ’  V pqn  4 PW 


The  standard  deviation  varies  as  Vw.  k  and  Vft,  measure¬ 
ments  of  skewness,  are  small  when  Vn  is  great.  (k2  —  3)  and  i  are 
small  when  n  is  great. 

Next  consider  the  chance  of  r  successes  and  the  shape 
assumed  by  the  diagram  when  n  is  increased. 


Case  I.,  when  p  =  q  =  \  and  n  is  even  -  2 n’ . 

Let  be  the  chance  of  n’  +  x  succr  ses,  and  therefore  n’  —  x 
failures. 


C 


n'+x 


I 

2  n'+x 


(2 n') !  _i_ 

(n'  +  x) !  (n'  —  x) !  ’  2271' 


(2 n') !  1  n'  (nr  —  1)  ...  (n'  —  x  4~ 1) 
n' !  n' !  ‘  22n’  *  ( n'  +  1)  (»'  +  2) . . .  (n'  4~  x) 


(2 n') !  _  1 

2 2n'  .n'  \ri\~  V^n’’ 


to 


1 

n' 


(Appendix,  Note  1  (132)). 


by  Wallis’s  Theorem, 


correct 


V7 Tfi' 


1  + 


n 


* +  h, 


i+%j 


4-2  4- ...  4-  x  2  i3-f-23  +  ...  +  *3 
ri  3  *  rt*  *  *  * 

2  I2i+1+22i+1-f  ...  +  X  X  \ 

2t4-i\  n'2t+l  ’  *  *  '°£  V1  _  »  /  ' 
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where  t  is  any  integer, 

__  *(*  +  I)  _  £  *2  (*  +  x)2 
”  '  3 


r'3 


n'  3  4  n 

2  x2t+2  + 


-•••  +  &  +  * 


2^+1  *  (2^+2)w'2<+1 

Write  %  =  rVri  —  tc,  since  from  p.  252 

c 2  =  2 pqn  =  2  .  £  .  £  .  2w'  = 


2W 


'2 


+  .  .  . 


tog(P..^-^-5?L(T  + 


1  Y 


2£  +  2 


(2^+1)  (2^+2)  \  nn 


VvV 


2W 


—  r2  +  terms  involving 


yV’ 


Hence  if  — ^  is  neglected,  as  in  the  value  of  P0  above 
Vn' 


x a 

p  —  _L_0-t2  _  T  _e  ^  — 


7T 


I 

C\Ztt 


X * 

2<t2 


since  o-,  the  standard  deviation,  =  c!  V2. 


W 


0-V2 


•  • 


(16) 


7 r 


And  since  c2  =  -,  P^  =  -  ,—e  n 

2  VttH 


Case  II.,  when  p  and  q  are  unequal. 

Let  Pa?  be  the  chance  of  pn  +  x  successes,!  and  therefore 
qn  —  x  failures. 


Pa;  = 


n ! 


_ _ _  Apn +xnqn—x 

(pn-\-x)  !  (qn—x)  !  *  ^  ^ 


n  1 


(pn) !  (qn)  I 


ppnqqn # 


qn(qn  —  1) . . .  (qn  —  x  -f  1)  jP 
(pn  +  1)  (pn  +  2) . . .  (pn  +  x)  *  qx 


*  Appendix  2,  formula  (133). 

f  It  is  assumed  for  simplicity  in  the  sequel  that  pn  is  integral  and  there¬ 
fore  P0  the  greatest  term;  since  n  is  large  and  powers  of  ^  are  finally 
neglected,  the  proof  is  not  affected. 
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s=x 


log  (P*/P„) 


S  =  1 


S 

qil' 


s=x 


SiogU  + 


s= 1 


pH' 


log  I 


X 

qn> 


SU+s)-S 


,r  I  /  S* 


i  2  Vc22^2  phi2' 


-S 


i  t  \qfnl  pWJ  *  *  *  1°^ 


I  /  5r  S'" 

+ 


2  p2q2n2 


x(x-\-i)  p  -j-  q  x  (x  +  i)  (2X  +  T)  p2-q2 
2  *  pqn  6 

a2  (#  +  i)2  p 3  +  #3 


4  *  3p2q2n2 

i  xt+1-{-...  pt±qt 
t  ’  t  4-  i  *  ptqtnt 


(* 


x* 


•  1  — I - v~9  4~  •  •  • 

\qn  2  q2n2 


Write  x  — 

log  (P«/P0)  = 


tc,  where  c 2  =  2_/>^  =  2 o-2 

t2C2  +  TC  2t3C3  +  3r2C2  +  tC 

c 2  3  c4 


t4C4  -f-  2t3C3  +  t2C 

3  7e 

2  *  ^ 


•  (1  -  3/1?)  -  • 


T(+lcf+l_|_  _  "  .  2rCp  2t2C2/)2 

t(t+l)2-tCu^  "•+  c2  +  C4  +•••> 


_t2+:{_i  +  ?i!(9_^)  +  2^} 


since  ^  +  ^=1, 


+  P)  -  -  3pq)  +  #2} 


+  terms  involving  -3 

0 

Regard  t  as  finite ;  that  is,  consider  only  those  values  of  x  which 
are  comparable  with  V pqn. 

If  we  neglect  ^ ,  (that  is,  if  we  neglect  T7^)>  we  have 

£2 

P*  =  P0e~T2  =  P^e  c2 . (1 7) 
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If  we  keep  i,  neglecting  ~  (that  is,  neglecting  we  have 

V*  =  'P0e-'\e~~c(l-m~'72) 

=  P0e“r2Ji  —  - — -  (r  —  |t3)},  since  \  is  neglected, 

c  c 


=  IV 


X2 

'  C2 


X2 

'2^2 


=  P0e  •"  •II 


K  X  X * 


t  2  \CT  3  O-V  J 

since  c  =  V 2  .  a  =  V 2 pqn,  and  k  =  - -  .* 


.  (18) 


The  value  of  P0  may  be  obtained  from  Stirling’s  theorem  for 


factorials  (Appendix,  Note  3  (134)),  viz.:  m  \  =  mmV2-n-m.e 


when  — „  is  neglected,  and  =  mmV 2t tm  e~m,  when  —  is  neglected. 


m 


Pn  = 


m 


n  i 


(pn)  !  (qn) 
nn 


ppnqqn 


2irn 


( pnYn(qu)(in  *  N7  27 rpn .  2? rqn 


g-n+pn+qn  ppnqqn 


neglecting  &c„ 


I  II 

=  77Z~:Z  ’  Slnce  P  +  q=i,  =  ~7=  =  — 7=. 

V2t Tpqn  r  *  cVtt  0-V27T 

Now  write  y  for  P*,  and  we  obtain  the  equations 


y 


V  2? Tpqn 


X2 

£  2pgn  — 


cV 


-  >2 
=  ^  = 
7 r 


X2 


(rV  27T 


0  2<r2  .  .  (19) 


when  -7=  is  neglected  and 

y  =  — 7^  «  2c2 1 1  ~  ( — —3  n . .  (20) 

0-V27T  1  2  Vo-  3 a*/J  ^ 

when  is  retained  and  -  neglected, 

X  n  n 

These  equations  express  the  chances  that  when  an  n-fold 
experiment  is  made,  as  described  above,  the  number  of 
successes  shall  be  x  in  excess  of  pn,  where  p  is  the  chance 
of  success  in  a  single  experiment. 
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The  curve  represented  by  y  =  ■■■—=  e  c 2  is  called  the 

CV  7 r 

"  normal  curve  of  error.”  *  Its  shape  is  shown  in  the  diagram 
at  the  end  of  the  book. 


m 


An  idea  can  be  obtained  of  the  importance  of  the  term 

i 


Vn 


=  by  taking  n  =  iooo,  p  =  XV-  Then  a  =  ygo  =  9’5  and 


k  —  -084  approx.  The  chance  is  sensibly  affected  when  x  is 
greater  than  a. 

When  n  is  great  the  actual  chance  of  one  assigned  number 
of  successes  is  small,  e.g.  if  p  —\f  n  =  1000,  the  chance  of 
exactly  500  (the  most  probable  number  of)  successes  is  only 
tV  approx.  The  measurement  that  we  find  useful,  however,  is 
not  that  of  particular  ordinates,  but  of  the  sum  of  the  chances 
over  a  range  of  values,  say  from  xx  to  x2,  where  x2  —  xx  is 
of  the  same  order  as  <j(=  Vpqn)- 

By  a  well-known  theorem  f  we  can  pass  from  summation  of 
the  ordinates  to  integration  of  an  area,  and  the  whole  chance  of 
a  number  of  successes  as  great  as  pn+  x1  and  not  greater  than 


X2 


pn  +  x2  is  ydx,  where  y  =  — 7= 

J  x  1  a  V 27 r 

— ^  are  neglected. 
vn 


XA 

2^2 


and  terms  involving 


X  "*'*'2  J 

Writing  z  for  -,  we  have  ydx  =  — y=- 

V  2 


Xi 


IT  •> 


z2 


e~lz*  dz,  and  a 


table  suitable  for  evaluating  this  function  is  given  on  p.  271. 

In  the  following  paragraph  important  constants  connected 
with  the  function  in  question  are  obtained. 


Area  of  curve  =  I  -  . — e~&2dz  =  limit  of  (p  +  q)n  when 

J  -  00  V  27T 

tends  to  infinity  =  1 . 


n 


1  00 


e  ~  &2dz 


=  V2Jr;  / 


e~ui  ,du=V- 


—  00 


r  00 

J  —  00 


e~au2du 


-\4  / 


7T 


X 2 


I  -„-o 


—  e  2(T2dx=  1 


-00  O' 


V27 r 


*  See  Edgeworth,  Encyc.  Brit.  Vol.  XXII.,  article  Probability ,  pp.  391  seq. 
f  Appendix,  Note  4. 


ALGEBRAIC  PROBABILITY  AND  THE  NORMAL  CURVE  OF  ERROR  269 


X2 


Write  nta  = 


V2 


—  e  2<y2  .Xs dx,  for  the  sth  moment  about  the 


-  co  crV  27T 


average,  which  is  the  origin,  the  area  being  1 . 

The  curve  is  symmetrical  about  the  ordinate  through  the  origin, 
and  m2t+i  =  o  for  all  values  of  /.* 


•  co 


m2  = 


trVW-'  - 

=r  - 

L  \/  27T 


x2e  2a2dx 


i=xe 


X2  - 
'^2 


+ 


-  CO 


V27I -J  - 


00 

e  2(j2dx 


co 


O  -f  (T2  =  O’2 


(21) 


as  was  already  known  from  formula  (13). 

1 


•  00 


m2t 


X2 


o-V  2 


2°2dx 


ttJ  -  00 


X2  _  00 


V27T  J  -  00  (TV  27T 

=  0  +  (2*  —  l)cr2m2u2  .... 


/: 


^2t  2^  ^^dx 
•  •  •  (22) 


Hence  m4  =  30-2  .  w2  =  30-4  =  3m22,  and 


k2  @2  UT2  —  3»  4  —  ,®2  3  —  °» 

as  may  also  be  obtained  from  p.  264,  when  n  is  infinite. 
mu  =  (22  —  1)  (2/  —  3)  .  .  .  3  .  io-2*,  by  induction, 


Ml 

2*  ,t\ 


cr 


2^ 


(23) 


E.g.  »*6  =  W8  =  I05cr8. 

r],  the  mean  deviation  (see  p.  in),  since  the  area  is  unity, 


X2 


00 


( T's/ 27 rJ  0 


xe  2<j2dx  = 


y2 

_  2<T_c 


and 


-  V2tt 
.  o-  _  /tt 

•  *  „  “  V 


CO 


-  A 

V  77" 


(24) 


* 


For  m2<+1  =  — 

(TV  27T 


X2 

00  2<r2  f 00 

#2<+1  e  dx  =  I  <p(x)  dx,  say, 

—  00  J  —  CO 


/  <p(x)dx=  <p  (x)  dx 
J  —  00  J  o 


dx',  where  =  —  x,  =  o. 
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The  “  probable  error  ”  (seep.  113)  is  obtained  by  finding 

from  the  table  the  value  ot  z  which  makes  e~**2  dz  =  h. 

.0  V27 r 

This  value  has  been  calculated  as  a  —  za  =  *6744900-. 

A  drawing  of  the  curve  is  given  at  the  end  of  the  book.  The 

points  of  inflection  are  obtained  by  equating  D“  y  to  zero,  where 


Thus 


y 


X 2 

2c r2 


X2 

logy  +  const.  - - - 

2or2 


-  .D2  ■ 

y  xy 


y(D^)2= 


X 


<T‘ 


1  at  the  points  of  inflection 


a 


and  x  =  ±  <r . (25) 

The  area  of  that  part  of  the  curve  which  stands  on  the 
base  0  to  a  is,  of  course,  the  tabular  value  of 


V2 


7 r 


dx  = 


V2 


e~lz2  dz  =  F(i)  -  -3413; 


77  J 


and  by  a  similar  use  of  the  table  we  readily  find  the  following 
approximate- values  : — • 


Proportion  of  Area  of  Curve  Standing  on  Certain  Bases. 


Base. 

Area. 

Base. 

Area. 

0 —  *2  <r 

•07926 

—  *2<r  to  +  *2<r 

•1585 

0 —  *6c r 

•2257 

+  *2 a  ,,  +  *6<r 

•I465 

0—  i-ocr 

•3413 

-f-  *6<r  ,,  -pDCxr 

•II56 

0—  i*4<r 

•4192 

-j-l'Ocr  ,,  -j-l*4<r 

•0779 

0—  i-8(r 

•464I 

-j- l’4<r  „  -f-l'S0' 

•0449 

0—  2-2<r 

•4861 

,,  -f2-2(T 

•0220 

0 —  2-6  <r 

•4953 

T 2-2<r  ,,  +2-6 <r 

•0092 

0  —  3-Otr 

•49865 

4-2-6cr  „  43 -oo- 

•OO33 

Note. — The  mean  deviation  and  probable  error  are  defined  in  Part  I.  pp. 
11 1-3. 

The  mean  deviation  is  the  average  without  regard  to  sign  of  the  differences 
between  the  measurements  of  the  items  which  make  the  group  and  a  central 
measurement  (generally  the  arithmetic  average). 

The  probable  error  is  the  distance  which  measured  left  and  right  from  a 
ecntral  position  includes  exactly  half  the  observations. 
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Table  of  Values  *  of  F  (, z )  =  — f  e  ~l~dz 

VzttJ  0 


z 

F(2) 

Z 

F(*) 

Z 

F(s) 

Z 

F(s) 

Z 

FO) 

•00 

•OOOO 

•50 

•1915 

1-00 

•3413 

1-50 

•4332 

2-00 

•4772 

•01 

•OO^O 

•51 

•1950 

I-OI 

•3438 

I*5I 

*4345 

2-02 

•4783 

•02 

•OO8O 

•52 

•1985 

1-02 

•3461 

1-52 

•4357 

2-04 

•4793 

•03 

•0120 

•53 

•2019 

1-03 

•3485 

i-53 

•4370 

2-06 

•4803 

•04 

•Ol6o 

•54 

•2054 

1-04 

•3508 

i-54 

•4382 

2-08 

•4812 

•05 

•OI99 

•55 

•2088 

1-05 

•3531 

1-55 

•4394 

2-10 

•4821 

•06 

•0239 

•56 

•2123 

1-06 

•3554 

1-56 

•4406 

2-12 

•4830 

•07 

•0279 

•57 

•2157 

1-07 

•3577 

i-57 

•4418 

2-14 

•4838 

•08 

•0319 

•58 

•2190 

1-08 

•3599 

1-58 

•4429 

2-16 

•4846 

•09 

•0359 

*59 

•2224 

1-09 

•3621 

i-59 

•4441 

2-18 

•4854 

•IO 

•0398 

•60 

•2257 

I-IO 

•3643 

i-6o 

•4452 

2-20 

•4861 

•II 

•0438 

•61 

•2291 

i-ii 

•3665 

i-6i 

•4463 

2-22 

•4868 

•12 

•0478 

•62 

•2324 

1-12 

•3686 

1-62 

•4474 

2-24 

•4875 

*13 

•0517 

•63 

•2357 

I*I3 

•3708 

1-63 

•4484 

2-26 

•4881 

•I4 

•0557 

-64 

•2389 

1-14 

•3729 

1-64 

•4495 

2-28 

•4887 

•15 

•0596 

•65 

•2422 

115 

•3749 

1-65 

•4505 

2-30 

•4893 

•l6 

•0636 

•66 

•2454 

1-16 

•377° 

i-66 

•4515 

2-32 

•4898 

•17 

•0675 

-67 

•2486 

1-17 

•3790 

1-67 

•4525 

2*34 

.4904 

•l8 

•O714 

•68 

•2517 

1-18 

•3810 

i-68 

•4535 

2-36 

•4909 

•19 

•0753 

•69 

•2549 

1-19 

•3830 

1-69 

•4545 

2-38 

•4913 

•20 

•0793 

•70 

•2580 

1-20 

•3849 

1-70 

•4554 

2-40 

•4918 

•21 

•0832 

•71 

•2611 

1*21 

•3869 

1-71 

•4564 

2-42 

•4922 

•22 

•0871 

•72 

•2642 

1-22 

•3888 

1-72 

•4573 

2-44 

•4927 

•23 

•0910 

•73 

•2673 

1-23 

•3907 

1-73 

•4582 

2-46 

•493* 

•24 

•0948 

•74 

•2703 

1-24 

•3925 

1-74 

•4591 

2-48 

•4934 

•25 

•0987 

•75 

•2734 

1-25 

•3944 

1-75 

•4599 

2-50 

•4938 

♦26 

•1026 

•76 

•2764 

1-26 

•3962 

1-76 

•4608 

2-52 

•4941 

•27 

•IO64 

•77 

•2794 

1-27 

•3980 

1-77 

•4616 

2-54 

•4945 

•28 

•1103 

•78 

•2823 

1-28 

*3997 

1-78 

•4625 

2-56 

•4948 

•29 

•II4I 

.79 

•2852 

1-29 

•4oi5 

1-79 

*4633 

2-58 

•4951 

.30 

•1179 

•80 

•2881 

1-30 

•4032 

i-8o 

•4641 

2-60 

*4953 

•31 

•1217 

•81 

•2910 

I‘3I 

•4049 

i-8i 

•4649 

2-62 

•4956 

•32 

•1255 

•82 

*2939 

1-32 

•4066 

1-82 

•4656 

2-64 

•4959 

*33 

•1293 

•83 

•2967 

i-33 

•4082 

1-83 

•4664 

2-66 

•4961 

*34 

•1331 

•84 

•2995 

i-34 

•4099 

1-84 

.4671 

2-68 

•4963 

•35 

•I368 

•85 

•3°23 

i-35 

•4H5 

1-85 

•4678 

2-70 

•4965 

•36 

•I406 

•86 

•3051 

1-36 

•4131 

i-86 

•4686 

2-72 

•4967 

•37 

•1443 

•87 

•3078 

1'37 

•4147 

1-87 

•4693 

2-74 

•4969 

•38 

•I480 

•88 

•3106 

1-38 

•4162 

1-88 

•4699 

2-76 

.4971 

•39 

•1517 

•89 

•3133 

i-39 

•4177 

1-89 

•4706 

2-78 

•4973 

•40 

•1554 

•90 

•3159 

1-40 

•4192 

1-90 

•4713 

2-80 

•4974 

•41 

•1591 

•91 

•3i86 

1-41 

•4207 

1-91 

•4719 

2-82 

•497b 

•42 

•1628 

•92 

•3212 

1-42 

•4222 

1-92 

•4726 

2-84 

*4977 

*43 

•1664 

•93 

•3238 

i*43 

•4236 

1-93 

•4732 

2-86 

•4979 

*44 

•I7OO 

•94 

•3264 

1-44 

•4251 

1-94 

•4738 

2-88 

•4980 

•45 

•1736 

*95 

•3289 

i-45 

•4265 

1-95 

•4744 

2-90 

•4981 

•46 

•I772 

•96 

•3315 

1-46 

•4279 

1-96 

•4750 

2-92 

•4982 

•47 

•1808 

•97 

•3340 

1-47 

•4292 

1-97 

•4756 

2-94 

.4984 

•48 

•1844 

•98 

•3365 

1-48 

•4306 

1-98 

•4761 

2-96 

•4985 

.49 

•1879 

•99 

•3389 

1-49 

•4319 

1-99 

•4767 

2-98 

•4986 

z  F  (*) 

3*oo  *49865 

3-20  -49931 

3-40  -49966 


2  F(s) 

3- 60  -499841 

3  80  -499928 

4- 00  -499968 


2  F(2) 

4.50  -499997 


*  Based  on  Dr.  Sheppard's  7  figure  Tables,  Biometrika,  Vol.  II,  Part  II. 
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It  has  been  calculated  that  F  (2)  =  J  when  2  =  *67449 
approx.  The  quartiles  of  the  curve  are  therefore  at 

±  -67449(7 . (26) 

and  it  is  just  as  likely  as  not  that  a  single  observation  will 
be  within  this  range  as  without  it.  -674490-  is  therefore  the 
“  probable  error  ”  and  is  frequently  used  in  preference  to  a 
to  measure  precision. 


Algebraic  Chance  and  Experience. 

The  analysis  so  far  has  been  purely  abstract,  the  illustra¬ 
tions  from  cards  and  dice  only  having  been  taken  to  visualise 
the  phrase  “  equally  likely.”  We  must  now  consider  what 
evidence  there  is  that  successes  do  occur  in  proportion  to  their 
algebraic  probability.  Though  we  should  certainly  be  sur¬ 
prised  if,  in  simple  cases,  successes  were  in  a  different  propor¬ 
tion — if,  for  example,  we  found  that  90  out  of  100  coins  tossed 
fell  head  uppermost,  or  50  repeated  draws  of  one  card  from  a 
complete  pack  (shuffled  after  replacing  each  card  drawn) 
were  all  hearts — yet  this  feeling  hardly  gives  more  than  a 
presumption  that  in  the  universe  there  is  some  method  in 
apparently  chance  events.  We  must  appeal  to  experience  and 
experiment.  In  a  general  way,  it  is  the  experience  of  players 
of  games  of  chance  that  events  do  happen  at  any  rate  roughly 
in  proportion  to  their  algebraic  probabilities  ;  canons  of  correct 
play  in  whist  were  based  on  this,  and  the  odds  were  given  in 
accordance  with  calculated  probability.  Insurance,  both 
accident  and  life,  is  based  on  the  belief  that  events  in  the  bulk 
are  predictable,  though  individual  occurrences  appear  to  be 
fortuitous,  and  this  belief  has  been  continually  justified.  A 
great  number  of  experiments  have  been  carried  out  directly 
for  the  purpose  of  comparing  the  frequency  of  the  occurrence 
of  events  with  their  a  priori  chances,  with  very  marked  success. 
We  can  never,  however,  obtain  a  certainty  that  the  preliminary 
condition  of  equal  probability  is  satisfied  completely,  nor  can 
we  expect  to  obtain  more  than  an  approximate  verification. 

Rough  experiments  can  easily  be  made  by  quite  simple 
means. 

Thus  from  numerous  packs  of  cards,  from  which  the  picture 
cards  had  been  removed,  4  cards  were  drawn  and  the  total  of 
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the  pips  on  them  counted  and  then  the  cards  replaced.  This 
was  done  90  times. 

The  chance  of  getting  a  total  of  r  pips,  if  the  number  of 
packs  was  so  large  that  the  draws  of  the  separate  cards  in  the 
quartets  could  be  taken  as  independent,  is  the  coefficient  of 

Xr  in  _^(x  +x2  +  ...  +  *10)4,  i.e.  in  ~ .  (~£ZZX  )  ,  and  may 

be  tabulated  with  the  results  of  the  experiment  as  follows  : — • 


Aggregate 

Experimental 

chance. 

X  90=  “  Expectation.” 

result. 

4  to  9 

•0126 

i*i34 

O 

10  „  14 

•0871 

7*839 

7 

15  „  19 

•2375 

21*375 

27 

20  „  24 

•3256 

29*304 

25 

25  »  29 

•2375 

21*375 

22 

30  »  34 

•0871 

7"839 

9 

35  4° 

•0126 

i*i34 

0 

1-0000 

90-000 

90 

The  total  of  all  the  pips  in  the  90  quartets  was  1956,  and 
the  average  per  card  5*43.  The  average  on  all  the  cards  in 
the  packs  was  5-5. 

It  is  evidence  that  the  experiment  corresponds  with  the 
expectation,  approximately  at  any  rate. 

Bernoulli’s  Laws. 

We  must  next  inquire  what  correspondence  between  theo¬ 
retical  and  expected  frequency  the  theory  itself  leads  us  to 
expect.  The  Law  of  Error  supplies  a  test. 

Consider  the  group  r  =  15  to  19  in  the  above  experiment. 
The  chance  of  finding  a  number  in  this  range  is  *2375  =  p. 
In  90  experiments  the  chance  of  finding  a  number  in  this 
range  t  times  is  the  t  +  Ith  term  of  (q  +  ^)90.  The  most 
likely  number  of  successes  is  21  or  22  and  the  standard  devia¬ 
tion  of  the  possible  number  of  successes  is  Vpqn  where 
n  =  90,  i.e.,  about  4.  In  such  a  multiple  experiment  many 
times  repeated,  the  chance  of  getting  anything  from  17  to  26 
successes  in  the  group  is  found  from  the  Table  to  be  about 
f  ;  that  we  should  obtain  so  great  a  number  as  27  (as  in 
the  experiment  tabulated)  the  chance  is  about  It  is  very 
unlikely  that  we  should  have  a  divergence  from  21  by  as  much 
as  3  times  the  standard  deviation  ;  that  is,  more  than  33  or 
less  than  9  occurrences  are  very  improbable. 
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This  process,  stated  more  generally,  leads  to  Bernoulli's 
Laws,  which  may  be  paraphrased  as  follows.  If  an  experiment, 
in  which  the  chance  of  success  is  p,  is  performed  n  times, 
and  p'n  is  written  for  the  number  of  successes,  then  as  n  is 
increased  p’  tends  to  approach  p.  The  chance  of  the  occurrence 


of  9  deviation  greater  than  p~p',  is  2 

p  ~  p' 


— y—  e  ~  iz*dz, 
V  27 r 


where 


-py 


\  n 

and  hence  as  Vn  increases  the  chance  of  any  assigned  devia¬ 
tion  diminishes.  By  increasing  n  sufficiently  the  chance  can 
be  made  as  small  as  we  please.* 

Now  it  is  the  result  of  general  experience  and  many  experi¬ 
ments  that  Bernoulli’s  Laws  can  be  realised  in  fact. 

If,  then,  we  can  obtain  the  condition  of  a  priori  equally  likely 
occurrences,  we  may  calculate  the  chances  of  various  events 
by  the  methods  of  mathematical  probability,  and  expect  that 
our  calculations  will  be  realised  in  fact  within  a  margin  deter¬ 
minable  by  the  law  of  error. 

On  the  following  pages  the  results  of  various  experiments 
are  shown.  The  first  three  compare  the  distribution  found 
with  that  given  by  the  law  of  error,  and  the  remainder  show 
the  working  method  of  determining  the  size  of  a  class  in  a 
large  group  by  the  method  of  sampling. 


Examples. 

I.  If  a  digit  is  taken  at  random  the  chance  that  it  will  be  less 
than  5  (o,  i,  2,  3  or  4)  is  J.  The  digits  in  the  7th  decimal  place  of 
a  book  of  logarithms  were  taken  50  at  a  time  and  the  number  {r) 
of  digits  less  than  5  was  noted.  The  chance  of  finding  r  such 
digits  is  the  r  +  Ith  term  in  the  expansion  of  (J  +  J)50.  n  =  50, 
/>  =  ?=£,  =  3-535  =  <r. 

pn,  the  most  probable  number,  is  25.  The  chance  of  not  exceed- 

x  % 

ing  25  +  x  is  F(z)  in  the  table,  p.  271  ,  where  z—~  — - ,  if  we 

3  535 


*  Notice  that  p~p'  is  the  deviation  of  the  proportions.  The  resulting 
actual  deviation  is  pn~p'n,  and  z  should  then  be  written 

pn  p'n 
V p(i  —  p)  n 

and  the  chance  increases  as  V  n  increases. 


ALGEBRAIC  PROBABILITY  AND  THE  NORMAL  CURVE  OF  ERROR  275 


assume  that  n  =  50  is  large  enough  in  a  symmetrical  curve  to 
allow  the  use  of  the  normal  curve  instead  of  the  binomial  series. 
The  50-fold  experiment  was  performed  300  times. 


r 

Z 

F(s) 

I3'5 

-3*2522 

•4994 

i4‘5 

—  2*9694 

•4986 

15-5 

—  2*6866 

•4966 

16-5 

—  2*4038 

•4919 

i7'5 

—  2*1210 

•4831 

i8-5 

— 1*8382 

•4670 

19-5 

-1*5554 

•4400 

20-5 

— 1*2726 

•3984 

21-5 

—  *9898 

•3389 

22*5 

—  *7070 

•2602 

23’5 

—  *4242 

•1643 

24*5 

—  *1414 

•0561 

25-5 

+  -1414 

•0561 

26-5 

+  -4242 

•1643 

27*5 

+  *7070 

•2602 

28-5 

+  *9898 

•3389 

29-5 

1*2726 

•3984 

30*5 

1-5554 

•4400 

3i'5 

1*8382 

•4670 

32*5 

2*1210 

•4831 

33-5 

2*4038 

•4919 

34-5 

2*6866 

•4966 

35*5 

2*9694 

•4986 

36*5 

3-2522^ 

•4994 

Differences  * 

X  300= 

/  Expected  f  Actual 
number  of  occurrences. 

•OO08 

•2 

O 

1 

at 

14 

•0020 

•6 

0  or  i 

0 

1 1 

15 

•OO47 

i*4 

1  or  2 

3 

16 

•0088 

2*6 

2  or  3 

2 

17 

•Ol6l 

4*8 

5 

3 

18 

•O27O 

8*i 

8 

7 

19 

•O416 

12*5 

12  or  13 

9 

20 

•0595 

17*85 

18 

18 

21 

•0787 

23*6 

24 

26 

22 

•OQ5Q 

28*8 

29 

21 

23 

•1082 

32-5 

32  or  33 

32 

24 

•1122 

33-7 

34 

42 

25 

•1082 

32-5 

32  or  33 

36 

26 

•0959 

28*8 

29 

30 

27 

•0787 

23*6 

24 

28 

28 

•0595 

17*85 

18 

15 

29 

•0416 

12*5 

12  or  13 

16 

30 

•O27O 

8*i 

8 

5 

3i 

•Ol6l 

4*8 

5 

2 

32 

•0088 

2*6 

2  or  3 

2 

33 

•OO47 

i*4 

1  or  2 

1 

34 

•0020 

•6 

0  or  1 

1 

35 

•OO08 

•2 

299*6 

0 

0 

36 

The  agreement  is  as  close  as  the  theory  leads  us  to 
expect  (see  Chapter  X).  The  standard  deviation  a  'priori  is 
a/ Pqu  —  3*535*  We  can  also  find  the  standard  deviation  of 
the  observations  a  posteriori  by  taking  the  square  root  of  the 
second  moment  as  on  p.  253.  The  average  is  25*043.  The 
second  moment  of  the  observations  about  an  origin  at  25  is 
(1  x  n2+  o  x  io2-f-  3  x  92+  .  .  •  +  1  X  io2)  -f  300  =  11*30, 
and  about  the  average  is  11*300  —  -043s  =  11*298.  The  square 
root  is  3*361,  which  differs  from  the  a  priori  value  by  *174, 
which  is  a  not  improbable  deviation  (see  formula  (120)  below) . 

2.  Instead  of  finding  the  expectation  at  each  value,  we  can 
test  the  distribution  by  the  method  illustrated  in  the  following 
example. 

In  a  book,  in  which  a  page  contained  37  lines,  it  was  counted 
on  each  of  100  pages  in  how  many  cases  the  first  (complete) 


*  Thus  when  v  =  13*5  and  14*5,  F(^)  =  *4994  and  *4986.  The  difference, 
•0008,  x  300,  is  the  expected  number  at  r  —  14. 

f  Nearest  whole  numbers  from  the  previous  column. 
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word  in  a  line  contained  1,  2,  or  3  letters.  In  3700  lines  such 
first  words  occurred  1317  times.  The  chance,  then,  that  a 
first  word  contained  3  letters  or  less  was 

_ 13  17  n  —  T  •/,  1383 

V  —  TTTOIT  >  Q  —  1  V  —  3T0  0 • 

The  chance  of  finding  r  such  first  words  in  a  page  was 
approximately  the  r  +  Ith  term  in  (q  +  />)37. 

The  a  priori  standard  deviation  is  V pqn  =  2-913  =  <r. 

The  occurrences  were  as  follows. 


Number  of  first  words 
of  3  letters  or  less. 


Number  of  pages  on  which 
these  occurred. 


7 

I 

8 

2 

9 

9 

10 

6 

11 

8 

12 

17 

13 

15 

M 

12 

15 

13 

16 

5 

17 

4 

18 

2 

19 

3 

20 

2 

21 

0 

22 

1 

Average  13-17  =  x\  standard  deviation  calculated  from  the 
observations  2-922.  Now  calculate  the  number  of  cases  to  be 
expected  in  grades  each  of  a  measured  from  the  average. 


x—3a=  4*43 
X—2a=  7-34 
x— a  =10-26 
x  =13-17 
x-\-a  =16-08 
x\-2a  =19-00 
3<r  =  21-91 


F  (-3)  =*499 
F  (  —  2)  =  -477 
F  ( — i)  =  -34i 
F  (o)  =-o 

F  (1  =*341 

F  (2)  =*477 

F  (3)  =*499 


Difference 
X  100. 

2-3 

13-6 

34-1 

34'1 

13-6 

2*3 


Occurrences. 

Under  7^  1 

7i  to  ioj  17 
io|  „  13I  40 

I3l  .»  l6l  3° 
i6|  „  19!  9 

I9l  ..  22J  3 


100-0 


IOO 


In  observations  where  the  measurements  are  necessarily 
integral,  it  is  not  easy  to  adjust  the  grades  to  multiples  of  a. 
But  where  the  observational  grades  are  narrow,  or  the  measure¬ 
ments  continuous,  this  method  (proceeding  by  equal  sub¬ 
multiples  of  a)  is  rapid,  and  since  the  grading  can  be  decided 
before  the  test  is  applied,  affords  a  good  and  simple  test. 

3.  A  similar  experiment  was  made  of  a  list  of  firms,  in  which 
there  were  74  pages  containing  about  40  names  each.  Each 
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firm  had  been  marked  for  administrative  purposes  if  it  employed 
a  certain  number  of  women.  One-fifth  of  all  the  firms  were  so 
marked.  On  any  page  the  chance  of  finding  r  firms  was  there¬ 
fore  the  r  -f-  ith  term  in  ( q  -f-  p)*°  where 

P  =  b  * .  =  V(i  •  i  •  4°)  =  2*53,  pn  =  8. 


Between  pn-\-2<r  and  £w-f3<r  .... 

Expected. 

1-7 

Actual. 

2  or  3 

„  +4<r 

+  20- 

3'3 

5.  6  or  7 

+  <r 

4-|<r  .... 

6-8 

4  or  5 

>>  +1^ 

-fa-  .... 

II-O 

9,  10,  11 

„  0 

+  £<r 

14-2 

13  or  14 

—  £<r 

+0  .... 

14-2 

15  or  16 

,,  —  <r 

—  £<r. 

II-O 

8 

,,  —  £<r 

—  ( r 

6-8 

7 

,,  —2a- 

—  f  <r 

3-3 

2 

„  -30- 

—  2a-  .... 

i-7 

5 

The  alternatives  in  the  final  column  are  due  to  the  difficulty 
of  adjusting  the  entries  to  the  predetermined  grades. 

In  this  case  the  preliminary  condition  of  independence  is 
not  completely  fulfilled  ;  the  chance  of  finding  a  marked  name 
should  not  be  affected  by  the  presence  or  absence  of  marked 
names  on  the  same  page  ;  but  in  fact  in  some  cases  the  name  of 
a  firm  was  repeated  for  each  of  its  branches,  and  all  the  branches 
did  or  all  did  not  employ  women. 


Application  to  Sampling. 

One  of  the  principal  uses  of  the  theorem  relating  to  the 
number  of  successes  to  be  expected  in  a  given  number  of  trials 
is  in  the  examination  of  a  large  group  by  means  of  samples. 
In  its  simplest  form  the  method  is  as  follows. 

In  a  “  universe  ”  containing  N  things  or  persons,  />N  possess 
a  defined  attribute,  where  N  is  known  but  p  is  not  known. 

n  things  are  selected  at  random  from  the  universe,  and  of 
them  p'n  are  found  to  possess  the  attribute. 

If  ^  is  small,*  and  if  in  the  process  of  selection  everything 

in  the  universe  has  an  equal  chance  of  being  chosen,  and  if 
the  choice  of  one  thing  does  not  influence  the  choice  of  any 
other,  then  the  chance  of  finding  (p  +  x)n  things  is  given  by 


*  The  necessary  correction,  when  —  is  not  negligible,  is  given  below,  pp.  282-4. 
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I 


where  a  = 


,  and  the  table  on  p.  271  can  be 


applied.  The  precision,  measured  by  increases  with  V n. 


It  is  shown  below  (p.  417)  that  in  evaluating  <r,  the  value 
p' ,  observed  in  the  sample,  can  be  substituted  for  the  unknown 
true  value  p. 

The  result  may  be  stated  thus :  the  value  of  p  in  the  universe 

/A'(x  _  A') 

is  p '  ±  a/— - — ,  the  expression  meaning  that  pr  is  the 

most  probable  value  from  the  data,  and  that  the  chances  of 
variations  from  p'  are  given  by  the  Table,  p.  271,  where  the 


standard  deviation  (the  unit  in  the  Table)  is 


ip'  (1  -  p') 


n 


It  is  clear  that  this  value  can  only  be  applied  to  the  defined 
universe,  the  members  of  which  have  the  chance  of  being 
enumerated.  The  importance  of  this  and  other  conditions  can 
be  best  illustrated  by  an  example. 

In  Reading  609  working-class  houses  were  visited,  and  in 
154  of  them  it  was  found  that  there  were  more  than  1  and  less 
than  2  inhabitants  per  room,  n  =  609,  p'n  —  154,  p'  =  -253, 
V p’q'jn  =  -0176.  The  proportion  of  houses  thus  occupied  is 


•253  ±  *0176. 

The  “  universe  ”  here  is  the  group  of  houses  (about  12,000) 
from  which  the  609  were  selected.  This  group  was  determined 
from  a  local  directory,  from  which  middle-class  and  large  houses 
were  eliminated  by  the  help  of  a  list  of  “  principal  residents  ” 
and  by  local  knowledge,  and  non-residential  houses  were 
omitted.  The  accuracy  of  the  measurement  for  working-class 
Reading  depends  on  the  completeness  and  accuracy  of  the 
directory  and  on  the  appositeness  of  the  method  of  elimina¬ 
tion.  If  a  rookery  of  slum  dwellings  had  been  omitted,  by  so 
much  the  universe  would  have  been  curtailed  ;  or  if  a  street 
of  middle-class  houses  had  been  included  the  universe  would 
have  been  extended,  unless  in  the  process  of  investigation  the 
error  had  been  found. 

In  this  case  the  selection  was  made  by  marking  one  house 
in  20  throughout  the  amended  directory.  It  is  shown  on 
p.  332  that  this  gives  a  more  precise  result  than  if  a  purely 
random  method  had  been  followed.  A  general  method  of 
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securing  randomness  is  to  give  numbers  from  i  to  N  to  the 
things  in  the  universe,  and  by  the  use  of  tables  of  figures  or 
otherwise  select  n  numbers.*  Great  care  must  be  taken  to 
ensure  pure  randomness  orsome  method  which  gives  a  more  pre¬ 
cise  result  than  pure  randomness.  It  was  found,  for  example, 
that  in  the  latitude  experiment  (p.  281)  randomness  was  not 
obtained  by  selecting  pages  and  dropping  a  pencil  on  names  ; 
the  entries  in  a  page  were  not  independent  of  each  other.  Any 
divergence  from  the  rule  that  every  item  must  have  the  same 
chance  of  inclusion  may  affect  the  result  disastrously. 

Of  course  inaccuracy  of  information  (e.g.,  as  to  the  number 
of  persons  resident  in  a  house)  is  to  be  avoided  ;  but  if  the 
errors  due  to  this  source  are  equally  likely  to  be  in  excess  or 
defect,  the  result  is  not  much  affected. 

It  should  be  noticed  that  the  accuracy  of  the  result  depends 
on  n  the  number  in  the  sample,  and  not  on  N  the  number  in  the 
universe.  The  size  of  the  universe  only  affects  the  problem 
in  that,  when  the  N  things  are  numerous  and  scattered,  it  is 
difficult  to  get  an  accurate  enumeration  and  secure  that  each 
has  an  equal  chance  of  being  chosen,  and  it  becomes  possible 
that  parts  are  omitted  from  ignorance  of  their  existence,  which 
differ  essentially  from  the  major  parts  included.  Further 
when  p  is  small,  />N  may  be  moderately  large,  while  pn  is 
relatively  small.  Now  if  pn  is  small,  the  approximation  to 
the  curve  of  error  (p.  265)  tends  to  break  down,  and  the  term 

involving  k  is  not  negligible ;  so  that  the  terms 

of  the  binomial  (q  +  p)n  should  be  used  instead  of  the  integral 
table.  A  little  examination  of  numerical  cases  will  show  that 
for  certain  small  values  of  p  it  is  quite  possible  that  no  thing 
having  the  attribute  will  be  found  ;  thus,  if  30  houses  in  a  town 
containing  10,000  houses  are  overcrowded,  and  800  houses 
are  examined,  the  chance  of  finding  no  overcrowded  house  is 
qnP°,  where  p  =  -003,  q  =  *997,  n  —  800,  that  is  *09  ;  so  that 
a  report  based  on  the  sample  might  not  contain  reference  to 
overcrowding,  unless  to  say  that  there  was  no  evidence  of  it. 


*  For  example,  if  N  =  10,000  and  n  =  500,  we  might  take  the  last  four 
digits  of  pages  in  7  figure  tables  till  we  had  500  numbers  all  between 
o  and  10,001,  and  investigate  the  things  to  which  these  numbers  were  affixed. 
This  method  was  used  in  the  experiment  on  the  number  of  persons  in  a 
parish.  See  next  page. 
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But  if  p  —  -03,  qnp°  is  only  about  and  some  instances 

would  certainly  be  found.  As  to  the  chances  of  occurrence  of 
small  numbers,  see  p.  284  below. 

Finally  it  should  be  emphasised  that  when  the  things  that 
should  be  included  are  determined  by  marking  in  a  list  or 
otherwise,  no  difficulties  of  measurement  should  be  allowed 
to  stand  in  the  way  of  their  inclusion.  If  a  householder  refuses 
information,  or  part  of  a  consignment  of  goods  is  out  of  the 
way,  there  is  a  presumption  that  the  characteristics  of  the 
house  or  the  goods  are  not  normal,  and  unless  the  difficulty  is 
overcome,  some  part  of  the  universe  is  not  represented. 


Examples  of  Sampling. 

1.  The  12,830  civil  parishes  enumerated  in  the  Census  of 
England  and  Wales,  1911,  were  numbered,  and  250  selected 
by  numbers  taken  from  logarithmic  tables.  The  following 
table  compares  the  distribution  of  the  parishes  according  to 
their  populations  in  the  sample  and  in  the  whole  group  (which 
is  set  out  in  the  Census  Volume,  Cd.  6258,  p.  428). 


Number  of  Persons  in  Parish. 


sample  of  250. 
1000  p' 


\l  2  so 


Under 

IOO  to 

200  to 

300  to 

400  to 

500  to 

1000  or 

100. 

200. 

300. 

400. 

500. 

IOOO. 

more. 

in 

35 

52 

42 

27 

20 

41 

33 

.  140 

208 

168 

108 

80 

164 

132 

22 

26 

24 

20 

17 

23 

21 

.  152 

192 

147 

I08 

80 

173 

146 

e  the  first  column  as 

an  example) 

35  were 

found 

IOOO 

-  250 

Actual  per  1000 


in  the  sample  of  250  with  population  less  than  100. 

35 


p'  = 


250 


•14. 


The  forecast  per  1000  parishes  is  therefore  *14  of  1000  =  140. 

/a'(j  _  p'\ 

The  standard  deviation  of  p'  is  y  — - —  *022,  p.  278,  and 

therefore  the  standard  deviation  of  1000 p',  i.e.  of  the  forecast 
140,  is  22.  Actually  in  England  and  Wales  there  were  152  per 
1000  parishes  with  less  than  100  people.  The  forecast  differs 
from  the  fact  by  about  half  the  standard  deviation.  (Statistical 
Journal,  1912-13,  p.  182.) 
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2.  From  a  list  of  the  rates  of  dividends  of  3878  companies 
400  were  selected  and  tabulated. 


Rate  of  Dividend  per  cent. 


Below  £3 

£3 

£4 

£5 

£6 

£Z 

Number  of  companies  in  sample  34 

108 

117 

60 

48 

33 

1000  p'  .... 

•  85 

270 

292! 

150 

120 

82J 

/PW 

1000  A  /  r-±- 
A^  400 

.  14 

22 

23 

18 

16 

14 

In  full  list  per  1000 

•  75 

272 

3ii 

177 

108 

57 

( Statistical  Journal,  1906,  p.  552.) 


3.  From  a  geographical  index  containing  31,210  names 
500  places  were  selected  and  their  latitudes  tabulated.  To 
secure  randomness  the  columns  of  names  were  numbered  and 
selection  made  from  numbers  in  mathematical  tables  ;  a  foot- 
rule  was  placed  over  the  column,  and  the  entry  against  the 
number  of  inches  on  the  rule  determined  by  the  first  digit  of 
the  longitude  of  the  first  place  in  the  column  was  selected. 
This  elaborate  method  was  found  necessary  to  secure  inde¬ 
pendence. 

Latitude,  North  or  South. 


0°  to 

10’  to 

20°  to 

30°  to 

40°  to 

50°  to 

6o°  to 

70°  tO 

0 

■M 

0 

0 

00 

IO° 

0 

20 

0 

30 

40° 

50° 

6o° 

70“ 

8o° 

90° 

Number  of 

places  in 

sample  . 
1000  p' 

•  •  • 

22 

56 

104 

I03 

93 

1 12 

9 

I 

0 

•  •  • 

44 

112 

208 

206 

186 

224 

18 

2 

0 

fp Y 

1000  A  /  — L 
Ax  500 

•  •  • 

9 

14 

18 

18 

17 

19 

6 

? 

? 

In  full  list  per 

1000 

51 

hi 

201 

200 

200 

215 

18 

3*4 

0-9 

Notice  that  the  places  north  of  80 0  N.  and  south  of  8o°S. 
were  missed  in  the  accident  of  the  selection.  In  another 
selection  where  n  =  2000,  1  per  1000  were  found  in  these 
latitudes. 

4.  Out  of  the  householders’  schedules  of  the  1911  Census, 
1  in  50  in  order  throughout  the  files  were  selected  in  Shoreditch, 
and  the  personnel  of  the  households  classified. 


Occupied  Persons.  Unoccupied. 

Males.  Females.  —  Total 


Over 

Under 

Over 

Under 

Over 

Under 

20  years. 

20  years. 

18  years. 

18  years. 

14  years. 

14  years. 

Number  of  persons  in 

sample 

1000  p' 

538 

1 12 

310 

74 

386 

718 

2138 

251 

52 

145 

35 

181 

336 

1000 

/  p'q' 

1000  k/\  • 

A  2138 

9 

5 

8 

4 

8 

10 

— 

Distribution  per  1000 

from  Census  tables 

258 

55 

T44 

33 

185 

325 

1000 
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Case  when  the  Universe  is  not  practically  Unlimited  or  the 

Selections  are  not  Independent . 

In  the  statement  of  the  experiment  which  leads  to  the 
normal  curve  of  error  (pp.  263  seq.)  it  was  assumed  that  the 
chance  of  success  for  each  throw  or  draw  was  always  the  same 
(p.  261),  and  that  each  trial  was  uninfluenced  by  what  had 
already  happened.  In  practice  this  condition  is  seldom 
completely  satisfied,  but  we  can  prove  in  a  similar  manner  that 
the  normal  law  of  error  is  obtained  under  a  wider  hypothesis. 

Let  a  universe  contain  N  objects,  of  which  ^>N  possess  a  certain 
quality  or  attribute,  and  q~N  do  not  (p  +  q=  1).  Let  a  selection  of 
n  be  made  in  such  a  way  that  every  object  in  the  universe  has  the 
same  chance  of  being  chosen.  Write  P^  for  the  probability  that 
pn  -|-  x  of  the  selected  objects  shall  possess  the  quality  in  question. 
E.g.,  if  the  “universe”  is  a  box  containing  1000  balls  of  which 
100  are  white  and  the  rest  coloured,  and  if  the  contents  are 
thoroughly  mixed  and  50  selected,  then  N  =  1000,  p  =  ^  (where 
white  is  the  attribute),  n  =  50,  pn  =  5,  and  P*  is  the  probability 
that  5  +  x  white  balls  are  present  in  the  selection. 

The  whole  number  of  different  possible  selections  is  NCn. 

The  number  of  selections  in  which  pn  x  are  white  and  the 
remainder  (qn  —  x )  are  coloured  ist3,NC3m+a.  X  qxCqn-x. 

Hence  P  =  ^  g^C^n-a? 

X  NCn 

_ _ (j>N)  !  (gN)  !  n  !  M  ! _ 

(pn  +  x)  !  (pM.  —  x) !  (qn  —  x)  !  (#M  -f-  x)  !  N  !’ 

where  M  =  N  —  n. 


1  1 


Apply  Stirling’s  theorem  to  the  factorials,  neglecting  — ,  -  and 

pn  n 

smaller  quantities.  (App.  formula  (134).) 

P0  =  (^N)H?N)*>rMM(^)^^ 

eo  (  \* 

1  '  V  pnpMqnqWNj  ' 

the  index  of  e  being 

pn  +  pM  +  qn  +  +  N  —  pN  —  qN  —  n  —  M 

=  0,  since  p-\-  q  =  i. 


*  E.g.  the  chance  of  obtaining  3  aces  in  a  hand  of  13  dealt  from  a  pack  of 
52  is  P2  =  4C3  x  48^10  ~  52^13  >  here  N  =  52,  ti  —  I3»  P  =  t $•  x  —  2.. 
P8  =*  *041  approx. 
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When  the  indices  are  collected  it  is  found  that 


p  =  (  N  V 

\2TTpqnW 


(27) 


{pn)  !  (/>M)  !  (qn) !  (?M) ! 


P* 

P0  ( Pn  +  x)  !  (pM  —  x) !  ( qn  —  x)  !  (<?M  +  x)  ! 

_  (pn)pn(pM.)pM(qn)qn(qM)qM  .  (2tt)°  .  e°  .  (pn  . pU.qn .  qM )* 

~  (pn  +  x)pn+x+'(pU  -x)m~x+l(qn  -  x)qn~x+i(qM  +  x)qu+x+k 

%  \<2n-x+$ 

pw  ' 


Po=(I+0^+l/i-4-TM_x+i/i. 


/ 


pn/  A  pM./  \  qn'  A  gM 

logPa;/Po=  (pn-\-x-\-\^  log  +  loS  (x~£) 


—  (0M—#+|)log(i—2|j)—(j-M+*+§)log(i+ 
=  -(*n  + *  +  *)(£- 


pM 

x*_ 
pn  2  p2n 


qn , 

x ^ 


+  (qn  —  *  + 


\qn  1  2^2n2 


+  .  .  . 


+  .  .  . 


+  (pM-x  +  ^)(~  + 


X J 


\7>M  27>2m2 


+ . . . 


(?M+"+Cii- 


A? 


2^2M2 


+  .  • . 


V_T  +  T+  1  _  _L'\_?Y.L+_L+_L  +  A 

2  V  pn'  qn  pM.  qM/  2  \pn  '  qn'  pM  ^M, 


1  *y  1  1  r  1  1  1  1  \ 

^  4  V/>2»2  T  q2n2  T  pm2  ^  q2M2) 


+  .  .  . 


n  is  ot  course  less  than  N,  and  we  may  take  it  (without  loss  of 
generality)  as  less  than  |N  and  therefore  less  than  M. 

Let  pn  and  qn  be  at  least  moderately  large,  so  that  we  proceed 

in  ascending  powers  of 


V pn 


xi 


A  solution  is  then  obtained  if  we  take  —  as  a  quantity  com- 

n 

X  X  x^ 

parable  with  unity,  and  therefore  -  as  of  order  —=  and  -5  as  of 

v  n  Vn  n 

order  - ,  as  on  p.  266  above. 
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Then  neglecting  terms  of  orders  and  higher,  we  have 

V  n 


,n„  P  /p  =_  xl( t±J + £±f\  =_  (”  +  M) 

b  ^  0  2  \  pqn  pq^l  J  2pqnM 


#2N 

2pqnM 


Write  ar2  for 


pqnM. 


1 

o’V/27T 


and 


a:2 

2^2 


This  is  the  normal  curve  of  error,  and  a  (as  above  shown, 
formula  (21))  is  its  standard  deviation. 


a2  =  pqn  .  n  =  pqn  (^1 


n 

% 


.  .  (28) 


and  is  smaller  than  its  value  (pqn)  under  the  conditions  of 
pp.  261-7,  but  tends  to  reach  it  (as  it  should)  when  N 
becomes  indefinitely  great. 


Law  of  Small  Numbers. 


In  the  deduction  of  the  normal  curve  from  the  terms  of 
(p  +  q)n  it  was  assumed  that  not  only  n,  but  also  pqn,  was 
large.  An  interesting  case  arises  when  p  is  so  small  that  pn 
is  no  longer  large,  q  being  in  that  case  nearly  equal  to  1. 

Let  u  =  pn,  and  be  a  small  finite  number. 


u 

n  * 


The  chance  of  r  successes  in  n  independent  experiments  is 


n ! 


(n  —  r)  l  r ! 


Neglect  then  the  product  of  ther— 1  factors  in  brackets, 


r(r—  1) 


2  n 


,  may  be  taken  as  1. 


which  is  between  1  and  1 
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(u\n 

I  —  —  )  tends  to  e~u,  and 

tends  to 

y 

and  to  1,  as  -  tends  to  o. 
n 

ur 

In  all  P,  =  e~u .  -r  . (29) 

r ! 

I  y  f2 

when  -  ,  and  —  are  neglected. 
n  r!  n 


cr 


2  _ 


pqn  =  u(^  1  —  o-  =  Vu,  approx . (30) 


3}-£- 

The  whole  curve  is  then  determined  by  u,  without  separate 
reference  to  p  and  n,  since  its  average  is  u,  its  standard 

deviation  Vw,  and  its  “  k  ”  — ^  It  follows  that  the  values 

of  p  and  n  are  not  easily  determined  separately  from 
observations. 

The  greatest  term  of  the  binomial  expansion  is 


_e~V  * 

pn  —  •  • 


U 


when  u  is  integral,  and  then 


u 


r-u 


.  U 


pn  • 


pn 


r/r 

u\u 


l 


' y 

m 


r  —  u  +  r 


u 


y 

and  this  rapidly  becomes  small  as  -  passes  through  integral 

i4> 

values.  E.g.  if  u  =  6,  and  y  =  3 u,  P3U  =  *00004. 

Consequently  the  observed  values  never  differ  greatly  from 
their  average.  Attention  has  been  directed  to  the  agreement 
between  the  fluctuations  of  small  numbers  and  the  law  of 
distribution  thus  described,  and  examples  have  been  given  by 
Bortkiewicz  ( Das  Gesetz  der  kleinen  Zahlen,  1898)  and 
Mortara  (Annali  di  Statistica,  Serie  V,  vol.  4,  1912).  It  is 


*  If  u  is  as  great  as  10,  this  differs  from  -  by  less  than  1  per  cent. 
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also  interesting  to  notice  that  the  theory  leads  to  what  may  be 
called  the  permanence  of  small  numbers.  If  among  a  great 
number  of  things  there  are  a  few  which  present  some  par¬ 
ticular  feature,  it  is  a  matter  of  common  experience  that  this 
small  number  is  seldom  much  exceeded  and  seldom  entirely 
vanishes ;  this  experience  applies  to  accidents,  fires,  the 
traditional  “  Derby  dog,”  and  to  the  rare  events  and  coinci¬ 
dences  with  which  some  newspapers  fill  their  columns. 
Specialists  in  all  professions,  from  the  doctor  who  treats  only 
one  obscure  disease  of  the  ear  to  the  dealer  in  curiosities,  make 
their  livelihood  dependent  on  this  permanence  of  small  numbers. 

To  take  an  example :  Out  of  some  530,000  deaths  annually 
from  all  causes  the  following  are  the  numbers  from  splenic 
fever  in  the  years  1875  to  1894  : — 


4,  10, 

Izb 

12, 

00 

H 

15,  8,  18,  11 

of 

H 

•N 

H 

H 

•v 

H 

H 

4.  3.  6, 

Average  c 

>75 

=  pn 

=  u.  e~u 

=  *00005842. 

r 

e~u  .  «r//  ! 

Forecast. 

Actual 

O 

•00006 

X  20  =  -OOI 

O 

i  to 

4 

•0343 

=  *7 

3 

5  99 

9 

•4564 

=  9-i 

6 

10  „ 

14 

•4408 

=  8-8 

8 

15 

19 

•0683 

=  I*4 

3 

20 

•  • 

small 

0 

CHAPTER  III. 


THE  LAW  OF  GREAT  NUMBERS  ( THE  GENERALISED 

LAW  OF  ERROR). 

So  far  we  have  treated  the  normal  curve  of  error  as  the 
limit  of  the  binomial  [q  +  p)n,  and  shown  applications  of  its 
integral  to  cases  where  p  had  a  definite  meaning.  The  same 

a2 

equation  y  =  — 7=  e  2a\  however,  is  found  as  the  result  of 

(TV  27 T 

much  wider  hypotheses,  and  it  is  the  main  purpose  of  this 
chapter  to  develop  them. 

Before  proceeding  to  the  general  law  there  are  some 
important  propositions  to  consider  as  to  the  relation  between 
the  standard  deviation  of  a  sum  or  average  of  magnitudes 
selected  from  a  large  group  or  groups,  and  the  standard 
deviations  of  the  magnitudes  themselves.  These  propositions 
(pp.  287-9)  depend  only  on  the  fundamental  laws  of 
probability,  and  are  independent  of  any  process  of  limits  or 
of  neglect  of  small  quantities. 


Standard  Deviation  and  Mean  Cube  of  Error  of  a  Sum  and 

Average. 

Let  uv  u2  .  .  .  Ut  .  .  .  um  1  be  mx  measurements  which  form  a 
frequency  group,  and  let  u  be  their  average  and  c ru  their  standard 
deviation. 

Let  Ut  —  u  -f-  Ut . 

Then  m1u  =  Sut  and  .*.  Sut  =  0, 

and  m^u2  =  S u\2  =  S  (ut  —  u)2  =S ut2  —  2 u  .  Sut  +  m-^u2 

—  S Ut2  —  2  u  .  m-fi  +  mxu 2 

S Ut2  =  %  (cr u2  +  U2) . (32) 

287 


and 
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Let  vv  v2  .  .  .  Vt  .  .  .  Vm2  be  m2  measurements  in  a  second 
frequency  curve,  whose  average  and  standard  deviation  are  v 
and  c tv. 

Now  select  at  random  one  object  from  each  group,  say  u8  and  Vt. 
Required  the  average  and  standard  deviation  of  the  group  formed 
by  all  possible  values  of  us  +  Vt,  every  double  selection  being  quite 
independent  of  every  other.  Let  H2  be  the  average  and  s2  the 
standard  deviation  of  this  group. 

We  will  suppose  that  an  indefinitely  great  number  of  inde¬ 
pendent  selections  is  made,  so  that  in  the  new  group  the  mx  x  m2 
possible  values  of  us  -f-  vt  occur  with  equal  frequency. 

Then  H 2xm1xm2' 

—  (%+Wi)+  •  •  •  Jr{u\Jrvt)Jr  •  •  •  +  (%+vm2)  m2  terms  in 

+  («2+vi)+  •  •  •  T(%d •  •  •  T(w2 each 
.  of  mx  lines 


•  •  •  "T  T  •  •  • 

=  m2 .  SutJr'm1 .  Svs  =  m2wp/-}-w1w2v 

/.H2=fl+t; . (33) 

and  m1m2(s224-H22)=S(iis-{-vt)2 

—  {ul~\~vl)2~\~  •  •  •  +  (%H-Vm2)2+  (%+^i)2+  •  •  •  +  (^2_l“?;w2)2+ •  .  • 

=  m2Sut2jrni1Svt2-\-2  .  S ut .  S vt 
=  w2w1(ortt2-f-z72)+w1w2(o-v2+i)2)-}-2m1//  .  m2v 
=  (o-u2+crv2)  +  {u+v)2\ 

S22  =  <rw2+<rt,2 . (34) 

If  the  group  was  formed  by  the  difference  us  —  vt  instead  of  the 
sum,  we  should  obtain  in  a  similar  way  H2  =  u— v,  but  s22  =  a-u2-\-crv 2 
as  before. 

Next  let  the  sum  (or  difference)  be  formed  from  three  groups, 
the  averages  and  standard  deviations  being  u,  v,  w  and  aU)  orv,  a-w 
for  the  groups,  and  H3,  s3  for  the  sum  or  difference. 

Then 

H3  =  u  ±  v  ±  w, . (35) 

as  can  be  readily  shown. 

We  can  obtain  s3  by  supposing  us  and  vt  first  combined,  and 
then  a  w  added,  and  using  the  formula  already  proved  twice  over. 

S32  =  s22  +  <TW2  =  <ru2  +  (TV 2  -f-  crw2  ....  (36) 

and  the  formula  can  be  extended  by  induction  to  any  number  of 
groups. 
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A  very  important  case  is  when  the  standard  deviations  of 
the  original  groups  are  equal,  so  that  au  —gv  =  .  .  .  =*=  a,  say. 

If  the  sum  is  formed  from  n  such  groups,  and  its  standard 
deviation  is  s,  we  have 

s2  =  sn2  =  o-2  +  cr2  +  to  w  terms  =  na2 
and  s  =  a  .  Vn . (37) 

Next,  instead  of  taking  the  sum  of  the  n  measurements,  let 
us  take  their  average.  Every  term  in  the  composite  group  is 
then  to  be  divided  by  n,  and  therefore  the  standard  deviation 
of  the  group  of  averages,  aa  say,  will  be  the  standard  devia¬ 
tion  of  the  group  of  sums  divided  by  n. 


s  a 
n  's/ fi 


Finally,  if  the  average  is  taken  of  n  items,  all  selected  inde¬ 
pendently  from  the  same  (indefinitely  large)  initial  group,  so 
that  the  chance  of  selecting  any  one  of  the  n  items  is  not 

affected  by  previous  selections,  we  have  still  aa  =  — ?=. 

In  the  following  paragraphs  it  is  assumed  that  the  original 
measurements  are  all  from  the  averages  of  their  groups,  and 
that  therefore  o  =  u  =  v  =  .  .  .  ,  and  0  =  S u  =  Sv  =  .  .  . 


The  mean  cube  for  the  sum  of  u„  and  vt  is - S (us  +  vt )3 

77^  j  777^2 


=  — — {m2Su3  +  m-,Sv 3  +  3  .  SvSu2  +  3  .  SuSv2}  =  —  Sus  +— 
m1m2  2  1  °  m±  m2 

=  ui*> 3  +  vf^z>  ^he  sum  of  the  third  moments  about  the 


average  of  the  groups. 

Hence  M3,  the  third  moment  of  the  sum  of  n  items,  all 
from  one  group,  =  n/iZ)  where  /jl3  is  the  third  moment  of  the 
group,  and  for  the  sum 


M3  n/j,3  k 

s3  rfi(jz  \/ yi 


where  k  is  tor  the  group  the  value  of  “  k  ”  as  defined  in 
formula  (8). 

k  is  the  same  for  the  sum  and  for  the  average  of  n  items 


u* 
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Genesis  of  the  Curve  of  Error. 


We  now  proceed  to  the  analysis  which  leads  to  the  applica¬ 
tion  of  the  curve  of  error.  A  quite  simple  case,  which  links  up 
the  two  parts  of  this  chapter,  is  as  follows. 

If  the  original  groups  are  represented  by  normal  curves  of 
error,  it  can  be  shown  that  their  sum  and  average  are  also 
normal. 

For,  if  we  write  xt  —  ut  +  Vt>  the  chance  of  the  concurrence 
of  values  of  the  parts  ut,  vt  is 

U%t  V 

— J—  x  — ^== e 

<ruv  27T  (TvV  2it 

T  x  (Ut 2  ,  (**-«<)* 

—  _  p  -  v  2  a  2 

-  O  u  v 

27 Tauav 


The  whole  chance  of  xt{  +  Sx)  is  obtainable  by  integrating 
this  expression  for  all  values  of  u,  and  equals 


2  — |—  2 


27T (T ^(7 ^ 


*00  u _ 

2  <r  *<r. 


gf2**  \ 2  _  _ 

e  z<ru‘^2  v*'  "  °-M2  +  V '  e  2K2  +  °-/)  du  .hx 


xt* 


xt 2 


.  g  2K2  +  v8) .  8* . 


2  7T  CT  |^(T  y 

=  (using  formula  p.  268) 


=  (^writing  u'  for  u  — 


<r  2  +  <r  2 
_  _u - v_  u,2 

£  2ovV 


&u2  +  0®2/’ 


S2  V27T 


e  2sa  .  8a;,  where  s22  =  au2  +  av2. 


The  chance  of  the  value  x  is,  therefore, 


s9  V27 r 


e 


X 2 
2V 


•  (40) 


The  process  is  easily  generalised  by  induction,  and  the 
chances  of  obtaining  x  from  the  sum  and  average  of  n  indepen¬ 
dent  selections  from  a  normal  curve  whose  standard  deviation 
is  a  are  respectively 


2n<r2 


erV  2 


and 


7 m 


Vn 

■—=  e 


nx% 

2<r2 


V2 


7T 


•  •  (41) 


The  same  result  is  obtainable  as  a  lirst  approximation  when 
the  original  curves  are  not  normal,  but  satisfy  certain  condi- 
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tions  which  are  obtained  in  the  analysis.  The  result  is  so 
important  that  two  proofs  are  given  in  the  following  paragraphs. 


Proof  by  the  Multinomial  Theorem. 


In  this  proof  it  is  shown  that  the  moments  obtained  by 

an  extension  of  the  method  of  the  preceding  paragraphs 

(formulae  (33)  to  (39))  are  for  all  orders  the  same  as  those  of  a 
normal  curve  of  error  with  appropriate  standard  deviation. 

Let  there  be  n  elemental  groups  containing  mlf  m2  .  .  .  mn 
measurable  things  respectively ;  in  any,  the  /th,  group,  let 
the  average,  the  standard  deviation  and  the  moments  about 
the  average  be  ut,  o-t,  tfi2i  .  .  .  ,  and  let  the  items  be 

fit  +  tui)  fit  +  tu2>  •  •  •  >  fit  +  tus  •  •  •  • 

Then  t^\  “l-  2  d-  •  •  •  H~  t^s  •  •  •  ==:  u. 

One  item  is  selected  at  random  from  each  group,  and  n 
such  items  are  added  ;  it  is  assumed  that  the  selections  from 
different  groups  are  independent  of  each  other,  and  that  the 
chance  of  obtaining  a  particular  magnitude  from  one  group  is 
not  affected  by  previous  selections. 

In  the  sth  selection  the  sum  is  H  +  Es,  where 

H  =  -j-  u2  T*  •  •  •  fin)  and  Es  =  ±us  -j-  2us  -f-  .  .  .  T  1  iUs. 

Let  s,  M2,  M3  .  .  .  be  the  standard  deviation  and  moments  of 
the  frequency  curve  of  E«,  that  is,  of  the  frequency  curve  of 
the  sum. 


M2  =  s2  =  mean  of  all  possible  values  of  (yus  -f  +  •  •  •  +nW*)2. 

There  are  mx  X  ni2  X  .  . .  X  mn  =  N,  say,  such  values.  Then, 
generalising  the  process  of  p.  288, 


M. 


=  s2  SjW2  -f  —  S2u2  +  .  . .}  +  .  S 2u  +...)- 

N  [m1  m2  2  J  N  \mym2  1  2  J 

=  otj2  +  ^22  +  •  •  •  +0n2,  since  o  =  Sj«  —  S2w  =  .  .  . 


Similarly 


Mg  =  |~  S  (1Ws  -j-  2 us  T"  •  •  •  4“  nWt)3 


I  IN „  .  .  N 


SjW3  + 


m. 


N  (Wj 

T*  a/^s  H~  *  •  •  ~ f~  rt/^3’  ^i^ce  o  etc. . )  \T) 

u*  2 
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and  M, 


i{— .SlM‘+  .  .  .j+4|— S1«sSa«  +  .. .} 

N  \mx  1  J  N  \mxm 2  J 

,  — - — S1waS2wS8w  +  . . . } 
N  \m1m2mz  J 


+  - 
XT 


+  n{^-Si“2,S2“2  +  -  '  '} 
+  ^  (■„  ^  S; 


S1uS2uSzuS^u -{■  .  .  .j 


N  im1m2m3m^ 

—  1^4  +  2^4  +  •  •  •  +  nr- 4  +  6  (o-^o-a2  +  o-jVg2  +  ...+)  .  (43) 

M4— 3s4  =  S^4+6  (or . . .)— 3(°ri2+cr22+  •  •  -)2=  S(*/x4— 3o-f4). 
If  the  standard  deviations  and  moments  of  the  elemental 
curves  are  equal,  so  that  a1  —  <r2  =  .  . .  =  a,  x^z  =  2/jlz  =  . . .  =^3 
etc.,  we  have 

rr 

s  for  the  sum  =  o-V n,  c ra  for  the  average  = 


k,  for  sum  or  average,  = 


M. 


UfXz 


V  n 

t 

K 


It*  cr3 


V  n 


(44) 

(45) 


where  k  is  the  “  k”  for  the  elemental  curves, 
M 


Kc 


4 _ q  __  n(r*  —  3°*)  __£/>. o\ 

*  ^  nV4  n  \cr4  ) 

sum  of  the  "  k2  s  ”  for  elemental  curves 


M 


(46' 


Hence  k  tends  to  zero  as  Vn  becomes  large,  and  k2  may  be  taken 

as  zero  if  -  is  negligible. 
n 

To  find  higher  moments,  we  need  to  evaluate  M4  for  any 
integer  t ;  that  is  the  mean  of  (xu  +  2u  +  .  .  .  +  nuy,  which 
(by  the  multinomial  theorem  *)  is  the  mean  of 

t ! 

- : - j -  .  ,wUl .  ,7^n2  .... 

nx  !  n2  !  .  .  .  A  i 


s 


*  The  multinomial  theorem  is  an  extension  of  the  binomial  theorem  ;  the 
following  is  an  outline  of  the  proof. 

The  product  of  t  factors 

(#1 T ^1 T ci 4*  •  •  •)  {az~\~i>2JrC2~\~  .•)...  {a,t+bt-\-ct-)r  •  •  •) 

=  the  sum  of  all  possible  terms  such  as  ax  a2  bz  c4  ds  b9  .  .  .  kt,  each  suffix 
occurring  once. 

The  number  of  such  terms  in  which  an  a  occurs  nx  times,  a  b  n2  times  .  .  . 
is  the  number  of  permutations  of  t  things  taken  altogether  in  which  nx  are 

t ! 

alike,  n2  alike  ...  i.e.  — j — - .  Now  write  «.=#«==  ...  =xu,  b1  =  bi=  ...  =  2w, 

nx  !  n2 ! . . . 

etc.,  and  we  obtain  the  result  in  the  text. 
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when  all  possible  terms  subject  to  the  condition  nt  -\-n2  +  •  .  •  —t 
are  summed. 

First  take  the  case  where  t  is  even. 


=  sum  of  means  of  the  terms  — ^ —  1  uni .  <>un* . . . 

nx !  n2 ! . .  . 1  2 

when  %  +  n2  +  .  .  =  2 1 

=  sum  of  terms  — r~r —  •  mean  ■> un 1  X  mean  « u'H  X  . . 

nx !  w2 ! .  .  . 

since  from  the  independence  of  selection  from  the  different 
elemental  curves  each  xu  occurs  with  each  2u  .  .  .  with  equal 
frequency. 

M2t  =  sum  of  terms 


(2/!) 

«! !  V  . . .  • lM”1 '  * “2 ' 


Now  restrict  the  analysis  to  the  case,  where  the  standard 
deviations  and  moments  of  the  elemental  curves  are  equal,  so 
that  ==  2^ni :=  *  *  *  etc. 


Let  there  be  /  factors  xfJLnv  2iun2 . . .,  in  any  selected  term.  Then 
such  a  term  occurs  nC /  times  in  the  various  guises 

if^ni  X  2/^2  X  3/^3  •  •  •  >  1/^2  X  2^ni  X  3^n3  •  •  •  ,  i/^n3  X  2/^2  X  3/^ni  •  •  •> 
each  of  which  is  identical  with 


X  /^n2  X  t^ilg  •••• 


Hence  M2*  =  Sum  of  terms 


2 1 ! 


nv 


iL/  ,  ■  Mni  X  X  .  . 

%  !  W2 !  .  .  . 

where  all  values  are  taken  subject  to  the  condition 

Wj  -f~  w2  -j—  •  •  •  ==  2/. 

Now  s2  =  wo-2,  s2i  =  nl .  cr2t,  and 

nC/=  w(w—  1)  . .  .  (n—  /  +  1)//! 

=  sum  of  terms 


M2< 


j2t 


(20  I 


-3(-£ 


I  — 


—) 

n  J  nJ  H  n 


nx !  w2 ! 


/! 


7^  O"*1  O’”2 
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It  is  now  necessary  to  restrict  the  elemental  curves  so  as  to 

satisfy  the  condition  that  ~n  is  finite  for  all  values  of  p, 

a? 

(U\V 

- )  is  finite,  or  that  the  effective  range  of  the 

curve  is  comparable  with  its  standard  deviation.  We  have  then 
to  consider  which  of  the  possible  terms  is  finite  and  which  of 

order  ~  or  higher. 
n 

Since  /q  is  o,  each  of  nlf  n2,  etc.  is  2  or  more  in  every  term 
that  is  not  identically  zero ;  hence  since  the  sum  of  the  /  terms 
nv  n2  .  .  .  is  2 1,  the  greatest  possible  number  of  such  terms 
is  t  and  f^>  t. 

nJ  i 

If  f<t,  the  fraction  —  is  of  order  -  or  higher. 

w  n 

If  f=t,  then  2  =  nx  —  n2  .  .  . ,  and  we  obtain,  as  the  only  term 


when  -  is  neglected, 
n 


since  /x2  =  o-2  and 


(2 1) !  i  n* 
~¥~  */!  'nl 


-X1-- 

n  \  ns 


2 1 ! 

5)  =  m- 


/-i 

n 


) 


is  between 
Hence 


Mo,  =  s2« 


!  and  !  -UtllL . 
(2 1) ! 


2  n 


*  2 n\ 


=  i  .  3 . 5  .  .  .  (2t  -  i)  s 


2t 


(47) 


when  terms  involving  -  are  neglected. 


By  a  similar  argument 


=  Sum  of  terms 


(2 1  +  I)  !  1 

nx !  n2  !  ...’/!'  nt+^\a-n^/  Vo-  / 


Here  there  is  no  term  not  involving  a  power  of  n  in  the 
denominator,  and  the  greatest  term  is  found  when  one  of  the 
quantities  nv  n2  .  .  .  =  3,  and  each  of  the  others  =  2  ;  so  that 

2t  +  1  =  n±  +  n2  +  .  .  .  =  2(/  —  1)  +  3  =  2/  +  1,  and  /  =  t, 

and  we  have  /  equal  terms  obtained  by  putting  nv  n2  .  .  . 
successively  =  3. 
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Then 


M2*+i  ,  (2t  +  l)\ 


..*»  +  ! 


:/X 


i-  1 

A4  .  /X u 


2*  *3  !  ’  t !  Vn  '  <r 


2<-fl 


t 


=  -.1.3. 5.  .  .  2/  +  I  .  -£? 


M 


V tt  o-3 

2(  _  ,  —  0  if  terms  involving  —pr  are  neglected, 

V  n 


(48) 


* 


and  M*+1  =  -  .  1  .  3 . 5  .  .  .  2t  +  1  .  M3  .  s'2< 

^3  ■  1 


-  9 


since 


“+'  3 

Ms 


(49) 


— -  if  terms  in  are  retained,  and  terms  in 
V«.(ts  Vn 


-  neglected. 


n 


These  moments  are  (see  formula  (23)  and  Appendix,  Note  6) 
precisely  those  which  are  obtained  from  the  curve 


X* 

I  'Ji* 

y  =  — 7=  .  e 

SV  27 r  5 


if  — y=  is  neglected,  and  from  the  curve 
Vn 


y  = 


SV  27T 


K  /X  I  X 

1  2  \S  3‘S3/ 


*2 

2s2 


if  -4=  is  retained  and  -  neglected  where  k  =  —£■. 

Vn  n  &  s3 

Hence,  if  we  may  take  identity  of  standard  deviations  and 
of  all  moments  as  implying  identity  of  curves,  these  equations 
are  the  first  and  second  approximations  to  the  curve  of 
frequency  required. 


Professor  Edgeworth’ s  Proof. 

The  proof  given  by  Professor  Edgeworth  (“  Law  of  Error.” 
Camb.  Phil.  Trans.,  Vol.  XX.,  Part  I.,  1904)  is  briefer  and  more 
general,  but  it  involves  rather  more  difficult  mathematical 
conceptions,  which  it  was  the  intention  of  the  analysis  just 
given  (which  is  essentially  based  on  Edgeworth’s  work)  to 
avoid. 

Edgeworth  gives  a  formula  for  any  number  of  successive 
approximations,  but  the  outline  which  follows  is  confined  to 
the  first  two. 

With  the  same  notation  and  conditions  as  before, 
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Let  E$  =  jWs  -f-  2tls  4  .  .  .  nils- 

Let  a  be  any  fixed  small  quantity,  only  used  to  select  terms  of 
the  same  dimensions, 

Then  eaE*  =  ea  1  lU* .  ea '  ^Ug .  ea  1  sU*  . . .  identically. 

The  mean  value  of  ea  * lUg,  that  is  the  mean  of 

(l  +  a  .  jM  +  —  •  i^2  +  ry  .  i^3  +  .  .  .) 

2  3  • 

1  ,  a2  ,  a3 

—  I  +  a  •  iMi  +  —  •  1/^2  +  ri  •  1^3  + 

2  O  1 

where  j/q  =  0. 

Since  the  selections  from  the  different  elemental  curves  are 

independent,  the  mean  of  the  product  of  ea‘ lUs  X  ea-  vu*  x  .  .  .  =  the 
product  of  their  means. 


.*.  1  +  a  .  Mj  -f  —  M2  -f-  — ,  M3  -f-  .  .  .  =  Product  of  n  factors 


such  as 


3  ! 

0?  (X3 

(!  +  —  •  +  r-j  .  tin  +  •  •  •) 


•••  log(x  +  oMl  +  ^-Ms+...) 

£ 


t  =  n  a 2  a3 

==  S  log  (i  +  —  .  «/x2  +  — y  •  Wz  +  •  •  •) 

t  =  1  z  o  1 

=  — S^2  +  •  S t^3  +  ^  S «/x4  +  .  .  .  —  S  (*/x2)  +  .  .  .) 

a? 

i  ~f"  ttMj  -| — M2  -{-... 

2 


2  • 
e 


CL 

34  (S^4  —  3S  (fM2)2) 

e 


— (i + ^-v2+ •  •  • +^i(^~s^2) + •  *  •)-(i+^-s^3+ . .  .y 

(x  “t"  3  S  (« /q) 2  j-  -j-  . .  . . . 

Equate  coefficients  up  to  a4. 

M1  =  o 

s2  =  M2  =  S tfi2  =  Sat2  =  na2,  if  a2  is  the  mean  of  oq2,  a22 . . . 
M3  =  S^3  =  nix 3,  if  [x3  is  mean  of  ^3,  2fx3  . . . 

^  =  i  •  (i  •  S^»)  +  2^  (S^4  -  3S  W*) 
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M4  —  3s4  =  S  («/x4  —  3<Tt4) 


M, 


x2  =  ^r  —  3  =  —  3  )<rt4  =  z«2'}  if  K2'  is  mean  (  ^ 


«/*4 


or 


W2(/4  \CT£4 


sor 


M, 


K 


^3 


K 


s  —  — ,  where  / 
n~(Tz  V  n  a 


43 
3 


Hence 


1  4"  —  s2  +  M3  +  Me .. . 


2  3! 

?<*2S2/ 


=  e 


V 


I  +  g  a3.  S3  .  k  +  .  .  .  Vi  +  —  a4S4  .  ~  k2'  + 


24 


On  the  right-hand  side  of  the  equation  in  every  case  the 
index  of  a  equals  the  suffix  of  /i  or  the  sums  of  the  suffixes 
of  powers  or  products  of  fi  s. 

Now  assume  that  throughout  the  elemental  curves  ~  is 


finite  for  all  values  of  p,  and  it  results  that  the  coefficient  of 
av  .  sp  contains  the  factor  as  has  been  worked  out  above 


n 


up  to 


Neglect  —  and  all  higher  powers, 
n 5 

2  t  j  j 

I  -f-  —  s2  -}-•••  “b  jt  M*  4"  •  •  •  =  1  H —  a2s2  d-  •  •  •  4*  tt  a2t  •  ~~7  4"  •  •  • 

every  odd  moment,  M2i+1>  =  o 


,2 1 


and  an  even  moment,  M0*,=  (2 /)  ! _ =  1 .3 . . .  (2/  —  1)  .s2t  (50) 

/ !  2* 

as  in  the  normal  curve  of  error  (formula  (23)). 

1  1 

Now  retain  — 7=  and  neglect  -  . 

y/n  n 


M2i  is  as  before. 


M 


2<+l  


§2t  —  2  _  ^3 


(2^+1) !  (/—  l)!2i  1 

(2/4-1)!  s2*-2 


I 

6 


K 


V  n 


M, 


/ 


2t+l  yt-i)  12*-1’ 
that  is,  the  (2/-|-i)th  moment  of  the  curve 


6  M3  =  3*i*3-5--«  (2/4-1)  •  M, .  s-2, 


sV2tt 

(see  Appendix,  Note  6). 


{■H 


'X  I  X* 


3  s3J)e 


X* 

'2*2 


•  *  *  *  (51) 
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Henca,  by  the  test  of  equality  of  moments,  the  curve  of 
frequency  of  the  sum  or  average  of  n  selections  under  the  given 
conditions  has  for  its  first  approximation  the  normal  curve, 

when  is  neglected,  and  for  its  second  approximation  the 
V  n 

skew  curve  already  given. 

Further  approximations,  which  so  far  have  been  found 
mainly  of  theoretic  interest  only,  are  given  by  Edgeworth. 

Statement  of  the  Generalised  Law  of  Error,  or  the  Law  of  Great 

Numbers. 

The  theorems  now  proved  can  be  summarised  as  follows, 
the  conditions  of  validity  being  restated  and  amplified. 

Let  there  be  a  large  number  (n)  of  elemental  groups,  each 
of  which  can  be  represented  by  a  frequency  locus,  such  that 
the  chance  of  obtaining  a  magnitude  U  by  selection  from  a 
group  is  a  function  of  U. 

Form  a  total,  H,  of  n  things,  one  selected  from  each  group, 
so  that  the  selection  from  one  group  has  no  (or  very  slight) 
effect  on  the  selection  from  another  ;  and  obtain  many  values 
of  H  by  repeating,  the  process,  in  such  a  way  that  the  selec¬ 
tions  which  make  one  value  of  H  are  not  affected  by  the 
selections  which  make  other  values.* 

Then  if  the  frequency  loci  of  the  elemental  groups  satisfy 
certain  conditions,  the  frequency  locus  of  H  has  a  definite  form 


to  which  y  =  — 7=  e  '2si  is  a  first,  and 
5  V  it  2 


is  a  second  approximation,  where  s2  is  the  second  moment  and 
ksz  the  third  moment  of  the  locus. 

The  frequency  locus  of  the  average  of  the  n  magnitudes  is 
of  the  same  form  as  that  of  the  sum,  and  k  has  the  same  value  in 
both  cases.  If  sa  is  the  standard  deviation  of  the  average, 

*  If  the  selections  from  one  elemental  group  are  not  independent,  but  the 
magnitudes  tend  to  come  in  batches,  then  more  values  of  H  are  necessary 
to  obtain  any  given  approximation  to  its  final  frequency  form  when  an 
indefinitely  large  number  of  values  are  taken. 
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5  ^ 

sV/  =  -,  where  s  —  a  Vn  and  —  —7=  if  <r  is  typical  of  the 
ft  vn 

standard  deviation  of  the  elemental  curves.  k  is  of  the  order 

— p=  in  comparison  with  1  in  the  frequency  equation  of  H,  and 

only  the  first  approximation  is  necessary  when  n  is  very 
great  or  when  the  elemental  curves  are  symmetrical,  in  which 
case  k  =  o. 

The  condition  that  must  be  satisfied  by  the  elemental 
curves  is  that,  if  fip  is  the  pth  moment  and  a  the  standard 

deviation  of  any  one  of  them,  ^  is  a  small  finite  number  that 


can  be  neglected  when  multiplied  by 


n 


for  all  values  of  p  ; 


this  is  secured  when  the  great  bulk  of  the  frequency  curve  is  on 
a  base  containing  only  a  small  multiple  (1,  2  or  3)  of  its  standard 
deviation  to  left  and  right  of  its  average.  This  condition  is 
quite  generally  satisfied  by  ordinary  frequency  groups  when 
n  is  at  all  large. 

The  first  and  the  second  approximations  are  only  valid  for 

X 

moderate  values  of  - ,  since  beyond  these  the  contributions  of 

further  approximations  become  sensible ;  it  is  only  the 
central  portion  of  the  frequency  curve  of  H  so  generated  that 
is  determinable ;  the  outer  portions  have  no  general  form,  and 
it  can  only  be  postulated  that  their  aggregate  volume  is  small, 
and  that  the  chance  of  exceeding,  say  3s,  is  negligible.  The 
range  that  is  to  be  understood  by  “  the  central  portion  ” 
depends  on  the  value  of  n  ;  as  the  number  of  independent 
elements  increases,  so  the  range  of  the  determinable  form 
extends.  In  ordinary  cases  with  n  as  great  as  100  it  may 
perhaps  be  said  that  the  frequency  curve  is  known  over  a  range 
of  2 s  on  either  side  of  the  origin. 

It  follows  that  the  applicability  of  the  law  of  error  to  given 
observations  should  not  be  denied  on  the  ground  that  the 
positions  of  extreme  values  do  not  conform  to  the  law. 


*  More  exactly  it  is  only  the  difference  between  this  ratio  and  the  corre¬ 
sponding  ratio  in  a  normal  curve  that  is  involved. 
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Case  when  the  Universe  is  Limited. 

On  pp.  287  seq.  it  was  assumed  that  the  selection  of  one 
item  did  not  affect  the  chance  of  further  selections. 

As  on  p.  282,  we  will  now  examine  the  case  where  the 
universe  from  which  the  selection  is  made  is  limited,  so  far  as 
the  determination  of  the  average  is  concerned. 

Let  a  group  of  n  things  be  selected  at  random  from  a  group  of 
N  things,  whose  measurements  are  u-\-ult  u  +  u2  .  . u  -f-  us,  where 

N 

u  is  the  average  and  S  ut  =  o.  Write  H  +  E  for  the  sum  of  the 

measurements  of  the  n  selected  things,  where  H  =  nu. 

There  are  NCn  equally  probable  values  of  E,  such  as 

#2  -P  ^2  ^3  T"  •  •  •  + 

Wj  -p  ^3  ~\~  ^4  ~P  •  •  •  ~ P 


The  sum  of  the  values  is  easily  seen  to  be  zero,  and  therefore 
the  mean  value  of  E  =  o. 

Let  s2  be  standard  deviation  of  E. 

Then  NCn  .  s2  =  sum  of  NCn  squares,  such  as  (ux  +  +  •  •  •  +  un) 2 

each  containing  n  terms. 


In  the  sum  each  square,  such  as  Ut 2,  occurs  ^  X  N Cn  times,  and 

each  product  2 u8ut  occurs  ~  x  — — — —  X  NCn  times,  since  in  all 

nC2  2 

there  are  n  X  NCn  squares  and  - —  X  NCn  products. 


r  «?2  _ 

•  •  N^n  •  <3  — 


-1) 


•  •  Susut 


.  »  N(rs,  ”  (”  -A- 

N  '  N  (N  —  1) 


1) 


where  <7  is  standard  deviation  of  the  universe  from  which  selection 
was  made, 


=  ncr2 


n  (n  —  1)  „ 

T -  <T 

N  —  i 


since  S ut  —  o 


o-2 .  n 


N  —  n 
N~^i 


c T*n  .  (  I 


n  \ 
Nx 


•  • 


(52) 


if  cj  is  negligible. 


/H  E\ 

Let  cra  be  the  standard  deviation  of  the  average  (  —  -p  —  )  of  the 

\  n  nj 

n  selections. 
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Then  s =  l =  ^  ■  \/(r  -  S) . (53) 

If  N  is  indefinitely  great,  we  obtain  — ^  as  before,  formula  (38) . 

Vn 

fl 

By  neglecting  we  exaggerate  the  standard  deviation. 

It  can  be  shown  (Isserlis,  Stat.  Journal ,  1918,  pp.  75  seq.) 
that  the  frequency  of  the  sum  or  average  is  very  approximately 
normal,  when  N  is  large  as  is  generally  the  case  in  practice. 

If  we  use  the  table  on  p.  271  with  the  value  we 

exaggerate  slightly  throughout  the  chance  that  a  deviation 
exceeds  any  given  amount. 

Note. — That  the  law  of  great  numbers  is  obtainable  from  the 
limit  of  the  terms  of  (p  +  q)n,  as  shown  above,  can  be  proved  as  a 
special  case  of  the  general  analysis. 

Let  each  elemental  group  contain  qm  zeros  and  pm  units,  where 

P  +  q  =  1. 

The  constants  of  such  a  group  are 

_  _  qm  X  o  +  pm  X  1  _ 
qm  -f  pm 


qm 


pm 


O  A  I 


pm,  qm  are  at  distances  +  q  and  —  p  from  the  average,  A. 


t 

K  = 


qm  (—  p )2  +  pm  ( q )2 
(p  +  q)m 

P3  _  g  (~  P3)  +  P  (?) 

O'3  (Pq)i 


=  pq,  or  =V pq, 

3  _q-p. 

Vpq 


Form  a  total  by  adding  selections  one  from  each  of  n  such 
curves  ;  this  satisfies  the  conditions  for  the  formation  of  H  above. 
The  total  has  a  frequency  curve  with  average  pn,  standard 

/-  / -  k  q  —  p 

deviation  o-Vw  =  V pqn,  and  k=  =  Vpqn ’  ^  a*reac^  f°und> 


p.  264 
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Illustrative  Examples. 


In  its  integral  form  the  law  of  great  numbers,  so  far  as  the 
second  approximation,  is 

X*  r 


X 


P(+*)=— =  e  2-\dx  +  —~= 

crv  2irJ  0  6V  2tt 


2\  _  **'v  * 


} 


e~*ztdz 


{i  —  (i  —  z 2)  e~izi] 


"\/ 2i r*  0  6\^27r 

=  F  (*)  +  */(*), . (54) 


where  P  (x)  is  the  chance  of  a  positive  deviation  from  the  average 

X 

not  exceeding  x,  z  —  -y  F  (z)  is  tabulated  on  p.  271,  and 
f(z)  =  — -={1  —  (1  —  jar*)  e-i**}  is  tabulated  on  the  next  page. 

6v  27T 


k  =  -j,  and  /x3  and  <r  are  the  third  moment  and  standard  devia- 

cr3 

tion  respectively  of  the  curve,  calculated  either  a  priori  or  from 
the  observations. 

Eight  examples  follow  to  illustrate  the  method  of  fitting 
the  curve  to  observations.  In  the  first  two  (words  and  bricks) 
the  genesis  of  the  measurements  leads  one  to  expect  agreement 
with  the  law  of  great  numbers  ;  in  the  next  two  (skulls  and 
plaice)  application  to  biometrical  measurements  is  shown ;  in  the 
next  (ages)  there  is  an  indirect  relation  to  mental  phenomena  ; 
in  the  last  three  (speeds,  food  consumption,  and  prices)  the 
nature  of  the  variation  is  complex  and  sporadic,  and  the  form 
of  the  frequency  curve  could  not  be  forecast. 

Only  the  first  example  is  worked  in  full. 


*  See  Appendix,  Note  6,  for  the  integration. 
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Table  of  Values  of  f(z)  =  — ^=.  -j  1—  (1  —  z2)e  ^ 2 

6V  27 r  l 


z 

/(z) 

z 

/(z) 

z 

/(z) 

Z 

/(z) 

Z 

/(z) 

•00 

•0000 

'50 

•0225 

1-00 

•0665 

i*5° 

•0935 

2-00 

•0935 

•01 

•0000 

•5i 

'0233 

I-OI 

•0673 

I‘5I 

'0937 

2-02 

•0931 

•02 

•0000 

'52 

•0241 

1-02 

•0681 

1-52 

•0939 

2-04 

•0927 

•03 

•0001 

•53 

•0249 

i'°3 

•0689 

i'53 

•0942 

2-06 

•0923 

•04 

•0002 

'54 

•0258 

1-04 

•0697 

i'54 

•0944 

2-08 

•0919 

•05 

•0003 

'55 

•0266 

1-05 

•0704 

1'55 

•0945 

2-10 

•0915 

•06 

•0004 

•56 

•0275 

i-o6 

•0712 

1-56 

'0947 

2-12 

•09II 

•07 

•0005 

'57 

•0283 

1-07 

•0719 

i*57 

•0949 

2-14 

•0906 

•08 

•0006 

.58 

•0292 

1-08 

•0727 

1-58 

•0951 

2-16 

•0902 

•09 

•0008 

'59 

•0301 

1-09 

'0734 

1*59 

•0952 

2-18 

•0897 

•IO 

•0010 

•60 

•0310 

I-IO 

•0741 

i-6o 

'0953 

2-20 

•0892 

•II 

•0012 

•61 

•0318 

i-ii 

•0748 

i-6i 

'0955 

2-22 

•0887 

•12 

•0014 

•62 

•0327 

1*12 

•0755 

1-62 

•0956 

2-24 

•0882 

•13 

•0017 

•63 

•0336 

II3 

•0762 

1-63 

•0957 

2-26 

•0877 

•I4 

•0019 

•64 

•0345 

I'I4 

•0769 

1*64 

•0958 

2-28 

•0872 

•15 

•0022 

•65 

•0354 

I'I5 

•0776 

1-65 

•0959 

2-30 

•0867 

•l6 

•0025 

•66 

•0363 

1-16 

•0782 

1-66 

'0959 

2-32 

•0862 

•17 

•0028 

•67 

•0372 

i'i7 

•0789 

1-67 

•0960 

2'34 

•0857 

•l8 

•0032 

•68 

•0381 

1-18 

0795 

i-68 

•0960 

2-36 

•0853 

•19 

•0035 

•69 

•0390 

I'I9 

•0801 

1-69 

•0961 

2-38 

•0848 

•20 

•0039 

.70 

•0399 

1-20 

•0807 

1-70 

•0961 

2-40 

•0843 

•21 

•0043 

'71 

•0409 

1*21 

•0813 

1-71 

•0961 

2-42 

•0838 

•22 

•0047 

.72 

•0418 

1-22 

•0819 

1-72 

•0961 

2-44 

•0833 

•23 

•0052 

'73 

•0427 

1-23 

•0825 

1 '73 

•0962 

2-46 

•0828 

•24 

•0056 

'74 

•0436 

1-24 

•0831 

1'74 

•0962 

2-48 

•0823 

•25 

•0061 

'75 

•0445 

1-25 

•0836 

i'75 

•0962 

2-50 

•0818 

•26 

•0066 

•76 

•0455 

1-26 

•0842 

1-76 

•0961 

2-52 

•0814 

•27 

•0071 

'77 

•0464 

1-27 

•0847 

1-77 

•0961 

2-54 

•0809 

•28 

•0076 

.78 

'0473 

1-28 

•0852 

1-78 

•0961 

2-56 

•0804 

•29 

•0081 

'79 

•0482 

1-29 

•0857 

1-79 

•0960 

2-58 

•0800 

•30 

•0086 

•80 

•0491 

i'3° 

•  0862 

i-8o 

•0960 

2-60 

'0795 

•31 

•0092 

•81 

•0500 

I'3I 

•0867 

i-8i 

•0959 

2-62 

•O79I 

'32 

•0098 

•82 

•0509 

1-32 

•0871 

1-82 

•0958 

2-64 

•0787 

•33 

•0104 

•83 

•0518 

i'33 

•0876 

1-83 

•0958 

2-66 

•0782 

'34 

•0110 

'84 

•0527 

r'34 

•0880 

1-84 

•0957 

2*68 

•0778 

'35 

•0116 

•85 

•0536 

1'35 

•0885 

1-85 

•0956 

2*70 

•0774 

'36 

•0122 

•86 

•0545 

1-36 

•0889 

1-86 

'0955 

2-72 

•O77O 

'37 

•0129 

•87 

'0554 

i'37 

•0893 

1-87 

•0954 

2*74 

•0766 

.38 

•0136 

•88 

•0563 

1-38 

•0897 

1-88 

'0953 

2-76 

•0762 

'39 

•0142 

•89 

•0572 

i'39 

•0901 

1-89 

•0952 

2-78 

'0759 

•40 

•0149 

•90 

•0581 

140 

•0904 

i-go 

•0950 

2-80 

'0755 

•4i 

•0156 

•91 

•0589 

i'4i 

•0908 

1-91 

•0949 

2-82 

•O752 

•42 

•0164 

•92 

•0598 

1-42 

•0912 

1-92 

•0948 

2*84 

•0748 

'43 

•0171 

•93 

•0607 

1*43 

•0915 

i-93 

•0946 

2-86 

•0745 

'44 

•0178 

'94 

•0616 

I#44 

•0918 

i'94 

'0945 

2-88 

•O742 

'45 

•0186 

'95 

•0624 

i'45 

•0922 

i'95 

'0943 

2’90 

•0738 

•46 

•0193 

•96 

•0632 

1-46 

•0924 

1-96 

•0942 

2-92 

'0735 

'47 

•0201 

•97 

•0640 

1-47 

•0927 

1-97 

•0940 

2-94 

•O732 

.48 

•0209 

.98 

•0649 

1-48 

•0930 

1-98 

•0938 

2-96 

•0730 

•49 

•0217 

'99 

•0657 

1*49 

•0932 

i'99 

•0937 

2-98 

•0727 

/(z) 

£ 

/(*) 

3-00 

•0724 

3'8o 

•0671 

3*£o 

•0702 

4*00 

•0668 

340 

•0687 

4*20 

•0666 

3-60 

•0677 

0* 

•0665 
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i.  A  lengthy  book  was  selected,  and  a  number  of  letters  in 
each  of  the  first  completed  words  in  10,000  consecutive  lines 
were  noted  (A),  also  the  total  number  of  letters  in  the 
1000  batches  obtained  by  adding  the  first  10  entries,  the 
second  10,  etc.  (B)  ;  and  100  totals  were  similarly  obtained  by 
adding  batches  of  100  (C). 

The  curve  of  frequency  of  A  is  purely  observational,  and 
its  form  cannot  be  foretold  ;  that  of  B  tends  to  satisfy  the 
conditions  under  which  the  law  of  great  numbers  appears,  but 
“  n  ”  is  only  10,  and  unless  A  is  nearly  normal  the  form  can 
only  be  foretold  in  the  central  region  ;  in  C,  with  “  n  ”  100, 
the  second  approximation  should  fit  over  a  considerable 
region,  and  the  first  approximation  will  be  sufficient  if  A  is 
fairly  symmetrical. 

A. — Distribution  of  10,000  Words  According  to  the  Numbers  of 

Letters  in  them. 

Observ- 


Number  of  letters. 

X 

ations. 

y 

xy 

x2  y 

x'iy 

X 

F(z)* 

Diflf. 

X  10,000 

1 

or 

•5  to 

i-5 

—  7 

127 

889 

6,223 

- 

43561 

—  1-62 

0-447 

490 

2 

yy 

i-5 

yy 

2-5 

-6 

1,792 

10752 

64,512 

- 

387072 

—  1-27 

’398 

770 

3 

yy 

2-5 

yy 

3-5 

-5 

1,984 

— 

9920 

49,600 

- 

248000 

—  -92 

•321 

1,020 

4 

yy 

3-5 

yy 

4-5 

—  4 

1,240 

4960 

19,840 

- 

79360 

-  -58 

•219 

1,280 

5 

yy 

4-5 

yy 

5-5 

—  3 

968 

- 

2904 

8,712 

26136 

—  -23 

•09 1  \ 

1,390 

6 

yy 

5-5 

yy 

6-5 

—  2 

812 

- 

1624 

3,248 

- 

6496 

+  -12 

•048/ 

1,330 

7 

yy 

6-5 

yy 

7-5 

—  I 

893 

- 

893 

893 

- 

893 

+  -47 

•181 

1,130 

8 

yy 

7-5 

yy 

8-5 

O 

634 

0 

0 

0 

-j-  -82 

•294 

850 

t  9 

yy 

8-5 

yy 

9-5 

I 

602 

+ 

602 

602 

+ 

602 

+  i-i  7 

•379 

570 

10 

yy 

9-5 

yy 

10-5 

2 

460 

+ 

920 

1,840 

4* 

3680 

+  1-52 

•436 

330 

11 

yy 

10-5 

yy 

ii*5 

3 

260 

+ 

780 

2,340 

+ 

7020 

+  1-87 

•469 

180 

12 

yy 

n-5 

yy 

12-5 

4 

116 

+ 

464 

1,856 

+ 

7424 

+  2-22 

•487 

80 

13 

yy 

12-5 

yy 

13-5 

5 

69 

+ 

345 

1,725 

+ 

8625 

+  2-57 

•495 

30 

14 

yy 

13-5 

yy 

14-5 

6 

21 

+ 

126 

756 

+ 

4536 

+  2-92 

•498 

10 

15 

yy 

14-5 

yy 

15-5 

7 

18 

+ 

126 

882 

+ 

6i74 

+  3-27 

*499 

10 

16 

yy 

15-5 

yy 

16-5 

8 

4 

+ 

32 

256 

+ 

2048 

+  3-62 

•500 

b 

10,000 

31942 

163,285 

— 

791518 

+ 

3395 

+ 

40109 

—  28547 

— 

751409 

x  =  -2-8547.  Average  is  8  —  x  =  5-1453. 
fj-2  =  16-3285  —  *2  =  8-1792,  a  —  2-860. 

1*3  =  -75-1409  —  3  (-2-8547)  (16-3285)  +  2  (-2-8547)3  =  18-1704. 


For  calculating  the  moments  an  arbitrary  origin  has  been 
taken  at  8. 

In  fitting  the  normal  curve  z  = - .  The  first  entry 

F (2)  =  -447  shows  the  proportion  included  between  the  average 
and  -5  letters  (x=—y. 5).  The  normal  curve  gives  530  instances 


* 


Table,  p.  271. 
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below  *5  letters,  and  in  other  respects  the  last  column  is  not  a 
close  approximation  to  y,  the  observations.  The  original  curve 
is  not  normal,  but  it  is  unimodal  and  continuous,  and  in  spite 
of  its  skewness  the  great  bulk  is  contained  in  the  limits 
x  ±  2a.  Hence  we  have  all  the  conditions  for  obtaining  the 
law  of  great  numbers  if  we  add  elements  taken  at  random  from 
the  curve. 


B. — Distribution  of  1000  Sums  of  the  Letters  in  10  Words. 


Number  of 
letters. 

2 

F  (z) 

Differences 
X  xooo. 

Observa¬ 

tions. 

/(*)  t 

F(z) 

+*/(+ 

Differences 
X  1000. 

4 

O 

O 

26-5 

—  2-650 

•496 

13 

8 

•078 

•528 

8 

31*5 

—  2-119 

•483 

39 

38 

•091 

•520 

37 

36-5 

—  1-588 

•444 

89 

97 

•095 

•483 

•384 

99 

4i-5 

-1-057 

•355 

154 

155 

•071 

173 

46’5 

—  -526 

•20l'| 

•002  J 

203 

227 

•025 

•211 1 

213 

5i*5 

4-  -005 

202 

202 

•000 

•002  ’ 

191 

56-5 

+  *536 

•204 

153 

134 

•026 

•193 

•328 

135 

61*5 

+  1-067 

•357 

88 

76 

•072 

78 

66-5 

4-I-598 

•445 

38 

37 

•095 

•406 

40 

7i-5 

+ .2^129 

•483 

13 

13 

•091 

•446 

18 

76-5 

+  2-660 

•496 

3 

9 

•078 

•464 

7 

81-5 

4-3-191 

•499 

1 

3 

•069 

•471 

2 

86-5 

+  3-722 

•500 

0 

1 

•067 

•473 

0 

For  the  1000  sums  the  average  is  51*453,  =  9*4X55> 

k  =  *4093.  The  sums  are  all  between  26  and  87.  The  calcula¬ 
tion  of  the  columns  z,  F(z)  and  Differences  are  on  the  same 
method  as  for  A.  The  normal  curve  now  fits  much  better, 
and  in  the  range  31*5  to  76-5,  that  is,  average  ±  2 a,  there  is 
nothing  to  be  desired  ;  but  the  formula  gives  too  many  below 
31-5  and  too  few  above  76*5. 

The  second  approximation  gives  a  very  close  fit  throughout, 
except  that  it  fails  to  stretch  so  as  to  include  the  one  entry 
above  86*5  (see  p.  432  for  test  of  fit). 


♦ 


f  Table,  p.  303. 

X* 


4-  for  negative  values  of  z. 
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We  should  have  expected  to  find  that  the  standard  devia¬ 
tion  and  k  of  these  observations  to  be  the  standard  deviation 

(2-860)  and  k  (-81)  multiplied  respectively  by  -v/io  and 

(formulae  (37)  and  (39)). 

But  2-860  x  Vio  =  9-04  and  -8i  4-  Vio  =  -25,  whereas  we 
get  from  the  B  observations  9-42  and  -41.  This  points  to  a 
failure  of  complete  independence  in  the  aggregation  of  the 
10  words ;  and  analysis  shows  that  the  author’s  style 
changes  from  the  earlier  to  the  later  part  of  the  book,  so  that 
there  is  some  correlation  between  10  words  taken  consecutively. 
In  fact,  when  we  sum  100  words  consecutively  as  in  C,  we  get 
o-  =  33*311  instead  of  2-86  x  Vioo,  while  when  the  order  of 
summation  was  re-arranged  so  as  to  include  entries  from  all 
parts  of  the  book  in  each  100,  <7  was  28-87,  which  accords 
with  theory. 

C. — Distribution  of  ioo  Totals  of  the  Letters  in  ioo  Words. 


N umber  of 

TT/ 

Differences 

Observa 

letters. 

Z 

X  100. 

tions. 

415 

-3-OOI 

*499 

*7 

I 

435 

—  2-400 

•492 

2-8 

2 

455 

—  i-8oo 

•464 

7.9 

7 

475 

—  1-200 

•385 

16-0 

19 

495 

— 599 

•225 

22-5 

25 

515 

—  •001 

•OOO 

22-6 

18 

535 

4-  -602 

•226 

*5-9 

18 

555 

+  1*202 

•385 

7.9 

6 

575 

+  1-803 

•464 

2-8 

3 

595 

+  2-403 

•492 

•7 

0 

615 

+  3,003 

•499 

•I 

1 

635 

+  3-604 

•500 

The  agreement  between  formula  and  observations  in  this 
table  is  very  close  (see  p.  432),  and  cannot  be  improved 
perceptibly  by  using  the  second  approximation. 

This  experiment,  which  was  devised  with  the  definite 
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intention  of  illustrating  the  law  of  great  numbers  (and  the 
correlation  surface,  formula  (102)),  has  thus  proved  to  be 
completely  satisfactory,  even  in  that  it  also  illustrates  the 
difficulty  of  securing  random  selection. 

2.  In  a  garden  the  paths  are  bordered  by  bricks  originally 
laid  (but  not  mortared)  lengthways  touching  each  other.  After 
they  had  been  exposed  for  some  time  to  the  influences  of 
weather  and  of  gardening  operations,  the  lengths  occupied  by 
143  sequences  of  4  bricks  were  measured  as  nearly  as  possible 
to  the  nearest  sixteenth  of  an  inch.  The  causes  of  variation 
were — inequalities  of  the  bricks  as  they  came  from  the  mould, 
inequalities  in  the  slight  interval  between  one  and  the  next, 
displacement  since  they  were  laid,  and  difficulties  of  measure¬ 
ment.  These  causes  are  multiple  and  independent  and  each 
of  small  effect.  It  might  be  expected  that  their  effects  can 
be  expressed  as  the  sum  of  errors,  and  that  the  distribution  of 
the  measurements  would  be  approximately  normal  and 
symmetrical. 


Distribution  of  Lengths  of  Four  Bricks. 


Length. 

Number  of 

Calculated  by  formula 

observations. 

(normal  curve). 

35 

I 

*7 

35iV 

I 

i*4 

35l 

3 

2*7 

7 

5-i 

35i 

11 

8-o 

35t\ 

4 

n*6 

35t 

21 

150 

35^ 

7 

17-7 

35b 

30 

18*3 

35A 

16 

17-5 

35 1 

13 

14-9 

35fi 

6 

11*4 

35l 

11 

7.9 

35ft 

7 

5*o 

35* 

4 

2*7 

35ft 

1 

i'4 

36 

0 

•6 

M3 

142*9 

X*  2 
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Except  for  an  obvious  tendency  to  give  the  measurements 
to  the  nearest  Jth  of  an  inch  instead  of  the  -jyth,  the  fit  is  fairly 
satisfactory.  When  this  tendency  is  corrected  the  fit  is  very 
good. 

3.  I  am  indebted  to  ProfessQr  C.  G.  Seligman’s  Some 
Aspects  of  the  Hamitic  Problem  in  the  Anglo-Egyptian  Sudan 
for  the  following  measurements,  whose  frequency  groups  I 
analysed  at  his  request. 


Skull  and  Stature  Measurements  of  the  Dinka  Race. 


Cephalic  Index. 

Nasal 

Index. 

Stature. 

Grades  from 

F(4 

Difference  Observa- 

Difference  Observa- 

Difference  Observa- 

average. 

X  148. 

tions. 

X  85. 

tions. 

X 1 16. 

tions. 

•2 

2* 

•I 

O 

•I 

O 

Over  3(r 

•4986 

•9 

I 

•5 

I 

*7 

2 

f <r-  • 

•4938 

2-3 

2 

i*3 

O 

i*8 

I 

2  <r—  . 

•4772 

6*5 

4 

3-7 

4 

5*1 

I 

|<r-  • 

•4332 

13-6 

14 

7*8 

6 

10-7 

6 

O'— 

•3413 

18 

12-8 

0* 

22-2 

13 

17*4 

22 

•1915 

2 

28-3 

30 

1 6-3 

27 

22-2 

24 

0- 

0 

O' 

28-3 

30 

16-3 

12 

22-2 

25 

•1915 

12-8 

8 

2 

22-2 

25 

17*4 

23 

—  <r 

•3413 

13-6 

13 

7-8 

6 

10-7 

6 

—  f  (T  . 

•4332 

6-5 

7 

3’7 

4 

5*i 

4 

—  2<r 

•4772 

2-3 

2 

i‘3 

3 

i-8 

1 

—  ftr  . 

•4938 

•9 

0 

*5 

1 

*7 

1 

Under  —  3<r. 

•4986 

•2 

0 

•1 

0 

•1 

0 

Total 

148 

148 

85 

85 

116 

116 

Average 
Standard  de- 

— 

72*7 

— 

91*6 

— 

178-6  cm. 

viation 

3-7° 

[ 

13-0 

9-66 

Except  for  the 

two 

extreme  cases 

marked 

*  the 

range  is 

normal,  and  the  deviations  from  the 

normal 

curve 

are  not 

greater  than  is  to  be  expected  with  so  few  examples. 

4.  The  lengths  of  554  plaice  measured  in  the  North  Sea 
Fisheries  Investigation  gave  the  following  results  : — 
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Length 

cm. 

Z  ’ 

F(*) 

Difference 
X  554- 

2-825 

i*3 

35-5 

•4976 

9-2 

34*5 

2-076 

•4810 

40-6 

33-5 

1-327 

•4077 

104-9 

32-5 

•578 

•2183 

158-6 

3i*5 

—  *171 

•0679 

140-3 

30-5 

—  -920 

— 1-669 

•3212 

72-7 

29-5 

•4524 

22-0 

28-5 

—  2-418 

-3-167 

•4922 

3-9 

27*5 

•4992 

*4 

26-5 

-3-916 

•5 

0 

25-5 

—  4-665 

•5 

Average  31*778  ;  <r  =  1-335. 
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Observa¬ 

tions. 

O 

6 


50 

105 

166 


145 

61 

10 

7 

3 

1 


554 


The  agreement  is  not  close  at  the  extremities. 

5.  The  number  of  school  children  of  various  ages  in  the 
sixth  grade  are  given  in  a  report  of  the  public  schools  of 
St.  Louis,  U.S.A. 

The  following  table  compares  the  data  with  the  first  and 
second  approximations  to  the  law  of  great  numbers 


Ages. 

Number  of 
children. 

Numbers  calculated  from 
1st  Approx.  2nd  Approx. 

IO- 

26 

39 

27 

II— 

201 

207 

204 

12- 

673 

630 

670 

13- 

1,001 

983 

995 

14- 

739 

785 

746 

15- 

310 

323 

307 

l6- 

80 

67 

79 

17- 

13 

9 

15 

l8- 

1 

0 

0 

Average  age, 

13-665  ;  <r 

==  1-190 ;  k 

=  *2059. 

The  first  approximation  fits  well  within  2a  of  the  average. 

The  second  approximation  is  remarkably  close  to  the 
observations  (see  diagram  in  Appendix,  Note  6). 

6.  The  speeds  of  ioo  pedestrians  were  calculated  from 
observing  the  time  they  took  between  two  marks  (Die  Schwan - 
kungen  der  landwirtschaftlichen  Reinertrdge — Mitscherlich) 
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Average  velocity,  1*5846  metres  per  second,  o  =  *2179  m. 


Numbers  at  various  speeds. 


Speed. 

Average  4-'5om  or  more. 

Calculated. 

1*1 

Actual 

2 

+  ’4° 

0 

>0 

0 

-M 

2-3 

2 

+  •30 

„  *4° 

5-o 

4 

-(-•20 

„  -3° 

.  9-6 

11 

-(-•10 

„  -20 

14-3 

10 

0 

„  -IO 

17-7 

18 

—  •10 

„  O 

17-7 

20 

—  •20 

„  "  IO 

14-3 

15 

— 30 

„ - 20 

.  9*6 

8 

—  *40 

„ - 30 

5*0 

7 

-.50 

„  —‘4° 

2-3 

3 

—  •50  or  less 

i-i 

0 

7.  From  material  collected  by  the  Working  Class  Cost  of 
Living  Committee,  1918,  the  expenditure  on  food  in  one  week 
by  970  urban  families  was  determined,  and  the  results  divided 
by  the  number  of  “  equivalent  adults  ”  (where  a  child  is  taken 
as  a  fraction  of  an  adult).  The  average  was  10-755  ;  a  —  3-156, 
k  =  -84. 

Number  of  Families. 


Weekly  expenditure  per 
“  unit  ”  on  food. 

.  .  ,  Calculated  by  Calculated  from 

2nd  approx.  Pearson’s  Type  III.* 

Not  exceeding  5-55 

18 

22 

7 

5’5s  .... 

IO7 

123 

122 

7*5 

255 

233 

252 

9‘5  .... 

245 

248 

250 

n*5  .... 

173 

168 

172 

13*5  .... 

IOI 

89 

95 

15-5  .... 

38 

5i 

45 

17-5  .... 

I71 

221 

!9i 

19*5  .... 

9 

L33  ii 

"35  7r27 

Over  21*5 

ft 

06 

0 

t> 

II 

ij 

=  4-035- 

ij 

8.  The  price  of  flour  was  determined  in  U.S.A.  in  272  places. 
In  five  towns  the  price  was  given  as  4  cents  per  lb.,  and 
these  were  evidently  exceptional  and  are  excluded.  For  the 
remaining  267  the  average  was  2-629  cents  per  lb.,  and  <7  =*3334. 


*  See  p.  345. 
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Towns  Classified  According  to  the  Price  of  Flour. 


Average. 


+  3*  or 

more 

to 

3<r 

+  2  (T 

)  > 

+  $* 

y  y 

2er 

+  <r 

y  y 

\<r 

+  i<r 

y  y 

a 

O 

y  y 

<r 

y  y 

O 

2 

—  <T 

y  y 

a 

—  2  <r  „  —<r 

—  2 a  ,,  —  §<r 

—  f<r  ,,  — 2  <t 
Less  than  —  4<r 


Calculated 
1st  Approx. 


Actual. 


*4 

2 

1*3 

3 

4-4 

1 

n-8 

9 

24-5 

16 

40-0 

43 

51'1 

67 

51’1  47 


40-0 

37 

24-5 

25 

n*8 

9 

4*4 

4 

i-7 

4 

In  the  range  average  ±  2 <x  the  agreement  is  fairly  satis¬ 
factory  and  satisfies  the  test  explained  below  (Chapter  X). 


CHAPTER  IV. 


APPLICATIONS  OF  THE  LAW  OF  ERROR. 

Precision  of  Sums  and  Averages. 

It  follows  from  the  previous  chapter  that  if  n  measurable 
things  are  selected  at  random  from  a  universe  where  the  sizes 
are  distributed  in  a  frequency  group  which  is  fairly  con¬ 
tinuous,  and  little  of  it  far  from  its  average  as  compared  with 
its  standard  deviation  (a-),  then  the  average  belongs  to  a 
frequency  curve  where  standard  deviation  is  cr/Vn  and  its 
form  approximately  normal. 

a  has  generally  to  be  determined  from  the  observations 
themselves,  and  may  differ  from  that  of  the  universe,  but  only 

by  a  quantity  of  order  x  — :=  (see  p.  417  below). 

vn  Vn 

The  first  illustration  that  follows  (persons  per  tenement) 
gives  12  cases  where  the  averages  of  samples  are  compared 
with  the  averages  of  the  universes  of  which  they  are  samples. 

The  two  following  illustrations  (digits  and  latitudes)  show 
how  the  distribution  of  a  number  of  averages  agrees  with  the 
normal  curve  of  error. 

In  cases  in  which  the  theory  applies,  not  only  the  standard 
deviation  of  the  average  can  be  given,  but  also  the  chances 
that  the  error  of  the  average  will  exceed  any  given  multiple 
of  that  standard  deviation. 

Since  the  universe  is  unknown  it  cannot  always  be  stated 
whether  its  frequency  group  satisfies  Edgeworth’s  conditions 
(p.  299)  or  not.  We  can  sometimes  test  this  from  the  samples 
themselves.  Suppose  that  we  take  k  samples  each  of  n'  items, 
and  form  their  averages  xv  x2  .  .  .  xk  into  a  frequency  group. 
Then  if  the  conditions  in  the  universe  are  satisfied,  this 
frequency  group  should  be  approximately  normal,  but  not 
completely  normal  if  n'  is  not  large.  If  this  is  the  case  a  major 

312 


APPLICATIONS  OF  THE  LAW  OF  ERROR 


3*3 


sample  may  now  be  formed  consisting  of  n  —  ri  x  k  items. 
Its  average  equals  the  average  of  xv  x2  .  .  .  xk,  and 
as  it  is  formed  by  selection  of  k  things  from  a  group, 
which,  being  approximately  normal,  satisfies  the  conditions 
in  question,  we  may  expect  that  the  error  in  the 
major  average  has  normal  frequency  with  standard 


deviation 


V  n 

The  k  quantities  xv  x2 


,  where  a  is  calculated  from  the  n  observations. 

.  .  xk  should  have  for  their  standard 


.  .  u 

deviation  approximately. 

Thus  in  the  example  on  p.  315  below  the  distribution  of 
the  2000  items  which  are  aggregated  in  80  groups  is  not  known. 
Here  k  =  80,  n'  —  25.  The  averages  of  the  80  groups  are 
found  to  have  standard  deviation  1-628.  It  may  be  deduced 
that  the  standard  deviation  in  the  universe  is  approximately 
1-628  X  V25  =  8-14.  Then  the  standard  deviation  for  the 


average  based  on  the  whole  2000  is 


1-628  x  V25  1-628 


y/ 2000 


\/80 


as 


stated  below. 

As  an  alternative  we  can  examine  the  frequency  group 
formed  by  the  n  selections  and  see  if  it  satisfies  Edgeworth’s 
conditions.  If  it  does,  we  may  take  it  that  the  error  of  the 
average  has  normal  frequency. 


Precision  of  Averages. 

A  sample  was  taken  (as  described  on  p.  281)  of  the  house¬ 
holders’  Census  schedules  in  a  number  of  districts,  and  the 
average  number  of  persons  per  tenement  was  calculated  in 
12  districts. 


In 

sample  of  i 

in  50. 

In  whole  district. 

Registration  district. 

Tene¬ 

ments. 

Persons. 

Persons 

per 

Tenements. 

Persons 

per 

Standard 

deviation. 

tenement. 

tenement. 

Bethnal  Green,  N.E. 

.  277 

1,224 

4-42 

13.850 

4-35 

•M 

S.W. 

.  278 

I,26l 

4*54 

13.905 

4-60 

•I4 

Shoreditch,  S. 

187 

792 

4-24 

9,331 

4-26 

•18 

N.W. . 

.  152 

693 

4-56 

7,623 

4-34 

•19 

N.E.  . 

.  156 

653 

4-19 

7,847 

4-39 

•19 

Spitalfields  . 

•  13° 

637 

4.90 

6,476 

4-79 

•21 

Whitechapel 

.  II7 

519 

4-44 

5,914 

4-72 

•22 

St.  George 

187 

924 

4'93 

9,374 

4*88 

•l8 

Shadwell 

95 

387 

4-07 

4,800 

4*37 

•25 

Limehouse 

•  133 

6ll 

4-59 

6,655 

4-54 

•21 

Mile  End,  S.W.  . 

267 

1,21 1 

4-54 

13,366 

4-71 

•15 

N.E.  . 

.  207 

839 

4*°5 

10,364 

4-4° 

•17 

3T4 
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From  a  study  of  the  Census  volumes  relating  to  the  whole 
districts  the  standard  deviation  (a)  of  the  number  of  persons 
per  tenement  are  found  to  range  from  2-38  to  2*75. 

The  standard  deviation  for  the  first  entry  above  is  then 

=  *14  if  we  take  the  lower  and  more  stringent  value  of 

cr.  The  other  standard  deviations  are  calculated  similarly. 

The  differences  between  the  sample  averages  and  the  whole 
in  6  cases  are  less  than  the  calculated  standard  deviation,  in 
4  cases  exceed  it  by  less  than  a  quarter  of  itself,  in  1  case  by 
30  per  cent.,  and  in  1  case  the  difference  is  twice  the  standard 
deviation. 

Normal  Distribution  of  Averages. 

Ten  digits  were  selected  from  successive  final  digits  in 
seven-figure  mathematical  tables  and  summed,  and  the  process 
repeated  till  1000  totals  were  obtained.* 

The  average  and  the  standard  deviation  of  the  group  so 
obtained  were  45*014  and  9*205  respectively,  as  compared 
with  45  and  V82*5  =  9*083,  which  would  be  obtained  from 
an  indefinitely  large  random  selection  if  the  digits  0  to  9 
were  equally  distributed. 

The  following  table  compares  the  distribution  of  the  1000 
with  the  normal  curve  of  error. 


Number  of  Totals  of  10  Falling  Within  Certain  Limits. 


Distance  from 
average. 

Calculated. 

Standard 

deviation. 

Observations. 

Differences. 

Above  |<r 

6 

2-4 

8 

+  2 

2<r 

17 

4*1 

17 

O 

%cr 

44 

6*5 

47 

+  3 

•o' 

92 

9-i 

75 

-17 

150 

ii*3 

157 

+  7 

O  to  \<T 

191 

12*4 

197 

+  6 

O  to  -J(T  . 

191 

12-4 

201 

-f-  10 

150 

ii*3 

148 

—  2 

—  <r 

92 

’  9-i 

77 

-15 

—  |<r 

44 

6-5 

50 

+  6 

—  2  <r 

17 

4-i 

20 

+  3 

Below  — %a  . 

6 

2-4 

3 

-  3 

The  standard  deviations 

are  calculated  from  the  formula 

•\//>  (1  —  p)n 

(formula  (13)), 

where  n 

=  1000  and 

is  the 

*  Such  selections  are  found  not  to  satisfy  completely  the  conditions  o 
independence.  See  Statistical  Journal,  19 12-13,  P-  7°2-  Nixon. 
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proportion  that  falls  within  a  grade  in  the  normal  law ;  thus 
between  a  and  f <r,  *092  of  al)  are  expected  and  p  =  -092. 

In  arranging  the  observations  <7  is  taken  as  9*205. 

The  differences  between  theory  and  observation  are  less 
than  the  standard  deviation  in  9  cases,  and  exceed  it  but  are 
less  than  double  in  3  cases. 

The  normal  curve  is  therefore  an  adequate  representation 
of  the  group. 

The  average  as  found  from  the  whole  sample  of  10,000  is 

0*205 

determined  as  45*014  with  standard  deviation  — =  *29, 

Viooo  y 

and  is  unexpectedly  near  the  average  of  the  sum  of  10  digits  in 
general. 

From  a  geographical  index  containing  31,210  names,  25 
were  selected  roughly  at  random,  and  their  latitudes  entered 
to  a  degree  (ignoring  minutes),  the  distinction  between  north 
and  south  being  ignored.  The  25  latitudes  were  then  averaged 
and  the  process  repeated  till  80  averages  were  obtained. 

The  general  average  was  35*0°  and  the  standard  deviation 
of  the  group  of  80  averages  was  1*628°. 

The  following  table  compares  the  distribution  with  the 
normal  curve  of  error  on  the  same  plan  as  in  the  last  example. 


Distance  from 
average. 

Calculated. 

Standard 

deviation. 

Observations. 

Differences  from 
nearest  integer. 

Above  |<r 

*5 

— 

I 

O 

or  1 

2<r 

i*3 

?I*I 

2 

1 

1  «r 

3*5 

1-8 

I 

2 

or  3 

a 

7*4 

2-6 

IO 

3 

12*0 

3*2 

12 

0 

0  to  £<r  . 

I5*3 

3*5 

12 

3 

O  £ <r 

I5*3 

3*5 

16 

1 

-\cr 

12-0 

3*2 

15 

3 

—  cr  .  . 

7*4 

2*6 

7 

0 

—  |<r  • 

3*5 

i-8 

2 

I 

or  2 

—  2<r 

i*3 

?i*i 

1 

0 

Below  —lo¬ 

*5 

— 

1 

O 

or  1 

in  eight 

cases 

the  difference  is 

below  the 

standard 

deviation,  and  in  the  remaining  two  slightly  above  it. 
The  average  35*0°,  as  found  from  the  sample  of 


80  x  25  =  2000  latitudes, 


1*628 

has  standard  deviation  degrees  =  *18  degree,  and  is 

therefore  not  known  accurately  to  the  first  decimal  place. 
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Absolute  Errors  in  Weighted  Sums  and  Averages. 


It  has  been  shown  above  (p.  2 88)  that  if  H  +  E  is  the  sum 
of  n  quantities  selected  independently  from  n  frequency 
groups,  whose  averages  are  1u,  2ii  .  .  .  nu,  and  standard 
deviations  O’  ^  ^  & 2  •  *  •  7l y  and  H  —  p7  -f-  2^  d-  •  •  •  d-  n^>  then 
H  +  E*,  the  sum  in  any  selection,  =  xut  +  2ut  +  . .  .  +  nuti  has 
for  its  standard  deviation  s,  where  s 2  =  oq2  -f  a22  +  .  .  .  +  <*n2- 
The  same  analysis  can  readily  be  re-arranged  to  show  that, 
if  we  take  a  weighted  sum 

H  +  Ei  =  Wj .  jUt  +  W2 . 2««  +  . .  .  +  W n  •  nUt,  where  Wlf  W2  .  .  .  are 
constants,  the  standard  deviation  becomes 

s2  =  W^i2  +  W 2V22  +  .  .  .  +  W^2  =  S  (WtW)  .  .  (55) 


H  +  E 

t 


and  the  standard  deviation,  sa,  of  the  weighted  average 
is  given  by 

S(W*W) 


2 _ 


(SWT 


(561 


If  n  is  large,  so  that  — ^  is  negligible,  and  the  other  condi¬ 
tions  stated  on  p.  299  are  satisfied,  the  frequencies  of  the 
sum  and  average  are  normal,  and  the  table  on  p.  271  can  be 
used  to  ascertain  the  chances  of  deviations  from  the  mean 
value  H. 


Let  o-2  be  the  weighted  mean  value  of  oq2,  cr22 .  .  .  crn2,  so  that 

•  ?2S  (W*2)  =  S  (WtW) 

Then  s2  =  a2  S(W*2), 


and 


-2s(W) 

(SW,)2 


•  (57) 


Now  let  SW t  =  nw,  and  Wt  =  wA~wt,  and  no-w2  —  S  (wt2),  so 
that  w  and  <rw  are  the  average  and  standard  deviation  of  the  W’s 
regarded  as  a  frequency  group.  S wt  =  o. 

Then 


S(W(2)  =  S (w2 -(- 2 wwt T Wt2)  =  nw'--\-2wSwt-\-Swt2  =  n(w2-\-<rw2)  .  (58) 
and  s2=nd2  ( w 2  +  crw2) . (59) 


n  (w2S-o'w2‘) 
(nw)2 


and 


< T 


-\/n  ‘ 


,  c rw 

1  d~  ~  2 

W 2 


(60) 
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The  last  formula  gives  in  a  convenient  form  the  standard 
deviation  of  a  weighted  average,  when  the  weights  are  known 
and  not  subject  to  error.  The  deviation  of  the  original  items 
is  reduced  in  the  ratio  1  :  ^ n ,  and  becomes  small  when  n  is 

great,  while  the  factor  ^(1  is  rarely  as  great  as  y"2, 


since  ~  measures  the  ratio  of  the  standard  deviation  of 
w 

the  weights  to  their  mean  value,  and  this  ratio  in  ordinary 
cases  is  less  than  unity. 

If  the  average  is  unweighted,  we  have,  of  course, 


a 

$a  ~  7“ 

yn 


(61) 


The  fundamental  formula  s2  —  SfW*2^2)  was  used  by  the 
Committee  of  the  British  Association  on  Small  Incomes. 
(See  Statistical  Journal,  1910,  p.  62,  where  different  letters  are 
employed.) 

There  were  31  classes  in  each  of  which  the  number  not  paying 
income  tax  was  estimated  as,  say,  Nt,  with  standard  deviation  st ; 
their  average  income  was  I*  with  standard  deviation  s't.  The 
aggregate  income  of  the  class  is  then  NJ*  with  standard 
deviation  at,  where 

cr*2  =  Mean{(N,  +  et)  (It+e't)  -  NJ*}2 
=  MeanjN^'e  +  ltet}2  =  N  t2s't2  +  h2st2, 

when  products  of  e’s  are  neglected. 


The  standard  deviation  for  the  sum  of  N*I*  is  therefore 
where 


s2  =  S(NW+  IM2). 


s, 


.  and  s\,  s'2  .  . .  were  estimated  separately  for  each  class. 

If  we  suppose  that  the  numbers  in  the  classes  were  known 
exactly  and  only  the  average  incomes  in  the  classes  subject  to 
error,  then  we  should  use  the  formula  above  s2  =  S(W*2<7t2)  = 
in  this  case  SfN^2),  which  we  of  course  also  obtain  by  writing 
st  =  o.  The  standard  deviation  of  error  in  the  average  income 


of  all  the  classes  taken  together  is  then 


V{S(N,V,2)} 
SN,  ■ 


In  the  investigation  S (lt2st2)  =315  X  io5,  S(N t2s't2)  =4  x  io8, 
so  that  the  errors  in  N  were  not  important.  S(Nf)=4023, 
S(NJf)  =  284,700,  and  the  average  income  in  1910  of  persons 
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other  than  wage-earners  not  paying  income  tax  may  be  written 
as  £71  with  standard  deviation  £5. 


Relative  Errors. 


So  far  we  have  dealt  with  absolute  errors  and  deviations, 
the  actual  differences  between  particular  values  and  observa¬ 
tions  from  their  means  or  true  values.  It  is  now  proposed  to 
discuss  relative  errors  and  deviations  (as  used  in  Part  I., 
Chapter  VIII,  supra). 

If  #  is  the  observed  value  of  a  quantity  whose  true  value 
or  mean  (as  the  case  may  be)  is  and  %  —  %'  (1+0),  then 

e  —  - — r—  is  the  relative  error  or  deviation.* * 
x 


1.  Products  and  Quotients. 

If  two  factors  F3,  F2  are  independent,  and  erroneously 
measured  as  F1(i+^1),  F2(i+^2),  and  e  is  the  resulting  relative 
error  in  their  product  P,  we  have 


P  (1  +  e)  =  F3  (1  +  e±)  .  F2  (1  +  e2),  where  P  =  F3F2  ....  (62) 
e  ==  e1  +  e2  +  eiez  =  ei  +  ez>  if  products  of  e’s  are  negligible. 

Hence  if  a,  <r3,  a2  are  the  standard  deviations  of  P,  F3,  F2, 
we  have  by  the  formula  (34),  p.  288,  a2  =  cr32  +  <r22. 

The  result  can  be  extended  to  any  finite  number  of  factors, 
so  that 

<r2  =  or2  +  o-22  +  0-32  +  . . .  .  .  .  .  (63) 

The  error  of  x11,  n  finite,  if  given  by  xn(i  +  e)={x(i  +  ^)}n, 
where  ex  is  the  error  of  x. 


ne1- f 


n  (n  —  1) 


+  . . .  =ne1 


(64) 


when  squares  are  neglected,  and  the  standard  deviation  of 
xn  is  na  where  a  is  that  of  a;. 


The  result  is  true  when  n  is  fractional.  E.g.  the  error  in  a 
cube  root  is  one-third  the  error  of  the  quantity.  Thus  if  a 
number  1006  is  taken  as  1000  (relative  error  *006),  the  relative 
error  in  the  cube  root  is  *002,  the  root  being  given  as  10,  instead 
of  io-02  =  10  (1  +  -002)  approx. 

x'  —  X 

*  In  Chapter  VIII  above  it  was  more  convenient  to  take  -  as  the 

x 

error.  If  we  call  this  ev  the  relation  between  the  two  is  —  —  e  e*  —  e3 ... , 
and  el  =  —  e,  when,  as  may  generally  be  presumed,  <?s  is  negligible. 
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If  e  is  the  error  in  Q  =  Fj/F^  and  F1  and  F2  are  independent  of 
each  other, 


(65) 


Q(I+e)=Hw  •  •  •  ■ . 

e  —  (1 4- 6q)(i -f  e2)_1—r  =  squares  and  products. 

crq2  —  _|_  (7 22,  where  vq  is  the  standard  deviation  of  e.  (66) 

If  e  is  the  error  in  a  power,  ax,  where  a  is  known,  and  e1  is  the 
error  in  x 

a*  (i  +  «)  =  a*<1+<1> 

e  =  aXCl  —  1  =  ex  .  x  log  a  when  e x2  is  neglected  .  .  (67) 

Generally  if  e  is  the  error  in  a  function,  f(x) 

f(x)  X  (1  +  e)  =  f{% (1  +  «,)(  =f(x)  +  «!*/'(*)  +  .  .  . 


and 


/'(*) 

6  =  XJ-~Hr  . 


f(x) 


(68) 


2.  In  Averages. 

Let  m  be  the  unweighted  average  of  n  quantities  Mx,  M2  .  .  . 
Mt . . .  M„,  and  let  M«  =  m  +  mt)  so  that  S mt  =  o.  Let  n<rm2  =  S mt2. 

Suppose  the  quantities  erroneously  observed  as  M*(i  +  et)  for 
M^,  etc.,  and  let  e  be  the  relative  error  in  their  average. 


m  (1  +  e)  =  i  S  (Mi  (1  +  et))  =  m  +  i  S  (M^t) 


Then 


=  ~5(=W 

n  ^  \m  / 


•  (69) 


If  sa,  crt  are  the  standard  deviations  of  e,  et  then  by  formula 

(55)  p-  316, 

S„«=Q(^<rt)  =  a»Q(^Y, 


if  o-2  is  the  weighted  mean  of  oq2  .  .  .  072  .  .  .,  or  if  all  these 
standard  deviations  are  equal 


since 

and 


Sa2  =  «-2 


S  mt  =  o, 


S(w  +  nit)2 


sf,  — 


c r 

V  n 


n2m2 


i  +  ~ 


CT 


m2  +  crm2 


nnr 


•  (70) 


This  formula  is  of  the  same  form  as  that  relating  to  absolute 
errors  in  a  weighted  average.  (Formula  (60).) 


320 


ELEMENTS  OF  STATISTICS 


In  this  formula  am>  m  and  Vn  are  known,  and  therefore 
s 

the  ratio  of  —  can  be  stated  exactly,  a  has  to  be  estimated 

from  whatever  circumstances  are  known  about  the  individual 
measurements. 

The  conditions  of  p.  299  are  generally  satisfied  when  an 
average  is  computed,  if  the  conditions  of  random  sampling  are 
preserved,  and  therefore  the  norma]  table  of  frequency  is 
applicable  if  n  is  large  ;  it  is  approximately  applicable  if  n  is 
no  greater  than  20. 


3.  In  Weighted  Averages. 

[Based  on  article  in  Stat.  Journal,  1911-12,  pp.  81-88]. 

Let  mw  =  — ,  where  Me  (and  m,  crm)  have  the  same  mean¬ 
's  W  t 

ing  as  before,  and  W2,  W2  .  . .  We  .  . .  Wn  are  weights. 

Let  We  =  w  +  wt,  where  nw  =  SWe  and  S wt  —  o,  and  let 
n<rw2  ==  S  wt2. 

Then  S  (WeMe)  =  nwmw. 

Suppose  that  the  weights  are  imperfectly  known,  so  that 
W t  (1  +  rjt)  is  taken  instead  of  We  etc. 

Let  the  errors  in  the  M’s  be  as  before,  and  let  e  be  the  resulting 
error  in  mw. 


Then 


ntw(i  -j-  e)  — 


S  {We  (r  +  rjt)  M«  (1  +  et) ) 

S{W«(i  +  ^)} 


e  — 


Sf Wt  (I  +  rjt)  Me  (1  +  et) } .  S We  -  S  (WeMe) .  S Wf  (1  +  vt) 
S  (WeMe) .  S{ W«  (1  +  rjt) } 

S  ( WeMe^e)  •  S We  +  S  ( WeMe^e)  SWe  -  S  (' Wm) .  S  ( WeMe) 


S(WeMe).SWe 


neglecting  erj  and  rj2 


_  S (WeMe^e)  S{(WtM.t.nw  —  Wt.nwmw)rjt} 

~  SlWeMej"  +  S (WeMe)  .nw 

_  S(W eMe^e)  S{We  m  —  mw)  yjt} 

nwmw  nwmw 


(7i) 


-v  T  —  S  { (w  +  wt)  [m  +  mt) }  nwm  +  mSwt  +  wSmt  -f  S wtmt 
JN  ow  rnw  = - = - = - — - - 


nw 


nw 


—  J  I  I  Q(Wt  Mt' 

I  n  \W  My 


and  mw  —  m  —  —  S  (wtMt) 
nw 


(72) 
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S(W$M^«)  S(W  tMtrjt)  ,  ,  -x  j-rr 

e  — - =^= - 4- - =-= -  approximately,  if  the  difference 

nwmw  nwmw 

m  —  mw  is  neglected,  while  in  full  the  numerator  of  the  second 
term  is  S  { Wt  (Mt  —  mw)  rjt}. 

Let  cr,  a-t,  o-'t  be  the  standard  deviations  of  e,  et,  r)t. 

Then  ^  =  (-^{S(W<M^2  +  S[W((M<-^)cr(T}.  .  (73) 

Let  cr j  =  cr 2  ==  .  •  •  ==  cr,  and  (t\  =  a' 2  =  ...  —  cr' ,  Or  let  cr2,  cr/2 
be  weighted  averages  so  that  cr2S  (W^M*)2  =  S  (W*M;o-*)2  and 
cr'2S  ( Wtmt )2  =  S  {W t  (Me  —  mw)  cr't}2. 


Then  5=  =  ^4—  {<r* .  S  (W,M,)»  +  </2S{We  (M«  -  mw)  }*}  .  (74) 

a  and  a  must  be  estimated  from  whatever  errors  seem  probable 
or  possible  in  the  circumstances  of  the  measurement. 

The  other  quantities  involved  can  be  calculated  from  the  obser¬ 
vations.  A  good  approximation  in  ordinary  cases  to  this  result  is 


n 


cr 


I  +  -^)  + 
nr  / 


n 


I  (rnx‘ 

v  w2  )  m2 


(75) 


This  approximation  is  obtained  as  follows  : — 

S  (W^M^)2  =  S{(w2  +  2wz£;f  +  Wf2)  («24-2«W{+®f2)i' 

=  tiw2m2  -(-  nw2ffm2  -f  nm2(rw 2  -j-  WcrOT2<rm2 

-f-  S wt2  ( mt 2  —  <r„j2)  +  4 m7vSwtmt  -f-  2wSwtmt 2  +  2mSmtwt2 


S(WAL)2  (  ,  <TW*\(  ,  \  ^u,2trm2R22  ,  4ffw<Tn 

n  (to) 2  \  w2  /\  m2  /  1  w2m2  win 


wX  ,  2(Tw(Tm2rj j  w2 <T my 2\ 

~~ 4 - 1  =r7T 

- it  J 


wm  • 


w‘rn 


where 


Swm 

r= - .  ^12 


Swm2 


S  w2m 


o’  ^21  —  „  9  j  xv22 

Yl(T W(T m  /yi(T w  (T ))t 


S  w2m2 
n<rw2(rm2 


Her  w(rm 

S{Wt  (Me  —  ww)}2  =  S{W,  {mt  +  m-mw)}2 

=  SW^rnj2  +  2  ( in  —  iri„)  S W f2m(  +  ( w  —  ww) 2  SWj2 


Sw2  (m2  —  <rm2) 

|1/r  2 

rou  w  U  m 


£  _ 

=  w2 .  n<rm 2  -f  2wSwtmt2  -f  S wt2mt2  —  —  Swtmt  ( 2wSwtmt  +  S wt2mt) 


w 


'Swtmt\2  _ 

+  (  J  W  (W  +  O'M-  ) 


SjWhMi-wJJ-2  <rm2  2<tw(T  , 


n  ( wrn )2 


nr 


wm 


2 


rr  2n-  2 


,  V  W  •' m  I  Tt  i  \  ,  O^w  O’jrt  2  ~  w  ~  710  .  ~  w  ~  -I/O  I  .  -  li' 

+  *5^lR2i+  j-4  W  -  !1+  l 1  +  5.« 


n-  3rr  2 
u  w  °  m 


rf  2(t  2 

u  w  u  m 


0  2  =  - 


m  \ 2 


n  \  m., 


(/12<r2  +  /,Va). 


where 


1  V  ^  A  ^  W*  /  4  7W  ///  7y  w2  W2  in 


r21  +  ■=='«■  • 


rt  2  /*  2 

u  ^ wt  T? 

te>2  '  w2  22 

7* 
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and 
2 


=  ^r{ 

nr 


i  + 


?u‘ 


+  (*• 
^  {w* 


3  w2 


9  1  ** w  f*tf . 

r2  +  2  —  r12  —  2  ^  r 
1  W  1  wz 


21 


1^.  u  1 
r  — o  ■*'"22  f 

W£  J 


Now  r,  r12,  r21,  R22  each  contain  in  their  formulae  factors  [mt,  wt  or 
mt2  —  crm2)  whose  sum  is  zero,  and  therefore,  unless  large  values  of  the  other 
factor  (mt2,  wt2)  are  found  specially  with  positive  or  specially  with  negative 
values,  the  sum  of  the  products  is  small,  and  terms  containing  these  tend  to 


»  1  Wjfl 

A1SO  ^=r 
m 


.  O'  w^m 

i  +  r 

wm 


be  small  in  comparison  with  the  other  terms. 

If  we  neglect  r,  rV2,  rn,  R22  we  obtain  the  approximation  given  above. 


Examples. 

Some  examples,  worked  in  detail,  will  show  the  relative 
magnitude  of  the  quantities  involved. 

i.  The  first  is  a  calculation  of  wages,  where  the  weights 
are  taken  with  great  roughness,  the  number  of  persons  of  both 
sexes  and  all  ages  being  taken  for  weights  to  compute  the 
average  wage  of  men  only.  It  is  only  in  very  imperfect 
investigations  that  so  deliberate  an  error  would  be  introduced. 

The  contribution  due  to  the  errors  in  observations  of 
quantities,  typified  by  a,  has  in  the  approximate  formula  two 
factors,  each  of  which  is  always  greater  than  i  and  generally 
less  than  2;  these  factors  can  be  computed  from  the  observa¬ 
tions. 

On  the  other  hand  the  contribution  due  to  errors  in  weights, 

/  fj  \  2 

typified  by  a  in  (75),  contains  the  factor  l  both  in  the 

approximate  and  in  the  complete  formula,  i.e.  the  square  of 
the  ratio  of  the  standard  deviation  of  the  quantities  to  their 
mean  value.  In  the  cases,  which  are  quite  common  when 
weighted  averages  are  in  question,  where  this  ratio  is  small, 
the  effect  of  errors  in  weights  is  smaller,  and  sometimes  very 
much  smaller,  than  the  effect  of  equal  errors  in  quantities. 
Hence  the  statements  (pp.  94  and  185)  that  under  ordinary 
conditions  more  attention  should  be  paid  to  accuracy  in 
quantities  than  to  accuracy  in  weights. 

Finally,  as  regards  weighted  averages,  the  table  of  proba¬ 
bility  on  p.  271  may  be  applied  to  measure  the  chance  of  devia¬ 
tions  greater  than  cF,  2a,  3?  ...  if  n  is  great,  and  it  gives 
approximate  values  when  n  is  as  small  even  as  20. 
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Metal  Trades,  Excluding  Engineering  and  Shipbuilding,  1906. 
Cd.  5814,  p.  xi  for  numbers,  and  p.  xiii  for  wages. 


Trade. 


Pig  iron  .  . 

Iron  and  steel 
Tinplate  .... 
Railway  carriages  . 

Iron  castings  . 

Electric  apparatus 
Wire  .... 
Brass,  etc. 

Gold,  silver,  etc.  . 

J  ewellery 
Edge  tools 
Smelting 

Cycles  .... 
Tubes  .... 
Nails,  etc. 

Bedsteads 

Farriery  .... 
Scientific  instruments  . 
Needles,  etc.  . 

Chains,  etc. 

Locks,  etc. 

Watches  and  clocks 

Typefounding 

Miscellaneous 

Total  . 

n,  the  number  of  trades,  =  24. 


Number  of 

Average 

persons 

earnings 

employed. 

of  men. 

W 

M 

000’ s. 

s.  d. 

14 

34  4 

54 

39  1 

11 

42  0 

46 

30  9 

12 

3i  4 

15 

34  7 

8 

35  7 

8 

31  9 

8 

36  6 

3 

38  0 

3 

31  2 

8 

3i  5 

7 

34  4 

7 

28  3 

5 

31  0 

2 

36  3 

2 

27  9 

2 

36  10 

2 

3i  9 

1 

35  4 

1 

28  0 

1 

32  7 

1 

33  3 

45 

32  5 

266 

S . W=  266.  w  =  ts.W=nTV- 

n 


m,  the  arithmetical  average  of  the  24  entries  of  earnings, 

=  33s-  =  33*5115. 

c rm  =  3 ’47.  o-w  =  i. 474-  w  and  w  are  the  deviations  of  indi¬ 
vidual  entries  from  m  and  w. 


S  wm 

r  = - =  -150* 

ilcrw(r  m 


12 


S  wm-2 
n<Tw<rm 


'°95>  r  21  = 


S  whn 

llcr  w^cr  m 


•280, 


S  m2w2 

na-m2aw2 


=  1‘77, 


cr  m 


m 


•104, 


—  =  1*33. 

w 


mw,  the  average  of  the  earnings  with  the  numbers  given  in  the 

•959- 


,  -  7  —  /  V  (Ty)\(T  m 

table  as  weights,  =  34s.  2 1  -f 


/  m  \2 


\m. 


w> 


3H 
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Then  working  with  the  notation  of  p.  321, 

lx 2=277  X 1*011+4  x  1*33 X  -104 X  *1504-2  X  1*33  X  -on  x  -095 

+2  x  177  x  -103  x  *280+177  x  -on  x  *264 
=2*8o  +  *o83+*oo3  +  *io2+*oo5  =  2*99. 

(VYl  \  2  , 

—  \  =2*87.  The  approximate  formula  gives  2*80. 


/22=*on{2*77+(3*i3— 3*99)  X  *0225 

+2  x  1*33  X  *095—2  X  2*35  X  *150  x  *280+1*77  x  *264} 

=  •011  {2*77— *020+ *253— *198+ *467}  =  -on  (2*77  +  *50)  =  *036. 


The  approximate  formula  gives  *031. 


0-2  —  2V  (2*87o-2+ *035o-'2)  . 

The  averages  of  the  men's  earnings  in  the  separate  trades 
are  perhaps  subject  to  an  error  of  6 d.  in  33s.,  in  which  case 
<r  =  ww,  <r2  =  *00023. 

The  errors  in  the  weights  may  be  considerable,  for  the 
weights  were  deliberately  taken  as  the  whole  number  of  persons 
instead  of  the  number  of  men. 

The  error  so  introduced  is  computed  from  p.  10  of  the 
report  at  about  *23,  so  that  a2  =  *053. 

With  these  figures  a2  =  *000027  +  *oooo 77  =  *000104.  a  =  *oi. 

Hence  the  average  may  be  written 

mw{ I  ±  a)  or  34s.  2 \d.  ±  4 d. 

Though  in  this  extreme  case  the  error  in  the  individual 
weights  is  taken  15  times  as  great  as  the  error  in  the  quantities, 
the  resulting  error  is  only  *0088  as  compared  with  *0052  due 
to  quantities. 

2.  Perhaps  the  most  important  use  of  weighted  averages 
is  as  index-numbers  of  prices. 

It  was  shown  above  (p.  204)  that  the  change  of  the  base 
year  was  equivalent  to  a  change  of  weights  ;  such  a  change 
will  by  the  theory  used  in  this  chapter  produce  an  unimportant 
effect  on  the  result,  if  the  necessary  conditions  are  found  to  hold. 

Sauerbeck's  numbers  of  the  prices  of  commodities  were 
tabulated  for  1900  and  1911  and  re-written  with  1900  as  base. 
Thus  in  the  first  entry  the  price  of  English  wheat  was  49  in 
1900,  58  in  1911,  when  the  average  of  the  years  1867-77  is 
taken  as  100.  This  was  written  100  in  1900  and  118  in  1911. 
The  45  numbers  so  obtained  give  an  arithmetical  average 
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107*82,  while  the  averages  of  the  numbers  as  given  by  Sauer¬ 
beck  in  1900  and  1911  were  75*07  and  79*69,  whose  ratio  is 
100  :  106*16. 

In  taking  the  simple  average,  when  all  the  numbers  are  100, 
we  in  effect  give  equal  weights  to  the  ratios ;  whereas  in 
Sauerbeck's  setting,  if  px,  p2  .  ,  .  are  the  separate  index- 
numbers  in  1900  and  px' ,  p2  .  .  .  in  1911,  the  general  index- 

numbers  are  Ix  =  — —  I2  =  ^  *  and 

1  45  2  45 

I  =  100  ~  gives  the  movement  from  1900  to  1911,  i.e.  106*16, 

■m 

P' 


T  S£' 

I  —  IOO  =  100 
s  P 


s P 


s P 


— ;  that  is  the  ratios  of  the  separate 


changes  are  weighted  with  the  separate  index-numbers  in  1900. 

We  will  examine  the  accuracy  of  the  average  on  Sauer¬ 
beck's  system,  that  is  taking  p,  now  written  w,  as  a  weight, 

and  —■ ,  now  written  m,  as  a  quantity. 

P 

The  quantities  involved  are  the  following  : — 


w 


r  v  W 

75-07.  "•«. = 20-67,  —  =  -275, 


m 


iv 


107-82,  crm  =  20-03, 


Sl  =  *i86,  r  —  —  *2944,  mw  =  106-2,  r12  =  *5°6,  r21  =  — *347, 

m 

R22=  *936. 

If  we  neglect  r,  r12,  r2l,  R22.  as  in  formula  (75), 

2  '2 

a2  =  —(1*076)  (1*035)  T - (1*076)  x  (*i86)2=o-2x  *o25+o-'2x  -00083. 

45  45 

If  we  include  these  quantities  a2  =  o-2  X  *024  +  or'2  x  -0012. 

The  difference  between  the  two  is  almost  solely  due  to  ru, 
i.e.  to  mean  wm2 ;  abnormal  increases  from  1900  to  1911, 
measured  by  m,  are  on  the  whole  found  with  abnormal  move¬ 
ments  from  the  base  1867-77  measured  by  w  ;  but  even  this 
influence  has  not  much  effect. 

The  error,  cr,  in  m  is  almost  solely  due  to  using  round 
numbers,  and  tends  to  be  about  and  hence 

a2  X  *024  =  (-0005)2, 

and  is  negligible. 

The  error  in  w  could  be  computed  if  we  had  a  definite 
system  of  assigning  importance  to  the  commodities.  In 
default  of  this,  suppose  they  ought  to  have  had  equal  weights, 
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as  in  the  alternative  computation  above.  Then  =  *275 

measures  the  dispersion  of  the  actual  weights  from  the  supposed 
true  weights,  and  a'2  X  -0012  =  (*275  X  *034) 2  =  (-0093) 2. 

Hence  a2  =  (-0005)2  +  (-0093)2  =  (-0093)2  approx.,  and  the 
index-number  may  be  written 

106-2  (i  ±  *0093)  =  106-2  ±  1, 

and  this  shows  the  kind  of  margin  we  should  have  in  mind 
when  using  index-numbers. 

Actually  the  difference  between  the  numbers  calculated  on 
the  two  hypotheses  is  107-8  —  106-2  =  i-6. 

Comparison  of  Averages. 

If  the  errors  in  the  two  investigations  are  quite  inde¬ 
pendent  and  lead  to  averages  in  the  form  Ax  (1  ±  cr1), 
A2  (i  dt  <r2)>  the  standard  deviation  of  A3/A2  is  Voq2  +  a22,  by 
formula  (66). 

But  it  often  happens  that  errors  in  the  same  sense  (both 
positive  or  both  negative)  are  made  in  corresponding  items  at 
both  dates  ;  thus  the  wages  of  a  class  may  be  underestimated 
at  both  dates.  In  such  cases  the  error  is  reduced  by  the  com¬ 
parison. 

Thus  to  take  the  case  of  a  simple  quotient  Q  =  Fx  -f-  F2; 

If  ex  and  e2  are  the  relative  errors  in  F3,  F2  and  their  standard 
deviations  are  oq,  o-2,  then  the  error  in  Q  has  standard  deviation 
V(°'i2  +  °"22)  by  formula  (66) . 

But,  if  d  =  e1—e2,  mean  d 2  =  mean  ef  -f  mean  e22  —  2  mean  eLe2. 
The  last  term  only  vanishes  if  all  values  of  e2  are  equally  likely  to 
occur  with  any  value  of  e1,  and  not  if  ex  and  e2  are  likely  to  be  of 
the  same  sign. 

E.g.  if  e2  =  \ex  always,  o-22  =  J^2,  mean  exe2  =  J  mean  ^12=i0-12, 
and  mean  d 2  =  oq2  -f  J  erf  —  cr12  and  the  standard  deviation  of  the 
ratio  is  Joq. 

The  necessary  analysis  for  the  ratios  of  weighted  and  unweighted 
averages  is  given  in  the  Appendix,  Notes  7  and  8. 

The  approximate  formulas  are  as  follows,  the  notation  being 
as  on  the  previous  page. 

If  sr  is  the  standard  deviation  of  the  ratio  of  two  unweighted 

averages, 
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where  era  is  the  standard  deviation  of  the  difference  between  et  and 
et  the  errors  in  measurements  of  the  corresponding  quantities 
Mf,  M«'  at  the  two  dates. 

While  if  sr  is  the  standard  deviation  of  the  ratio  of  two 
weighted  averages,  then  approximately  under  certain  conditions 


where  era  is  as  before,  0-  and  o-'  are  standard  deviations  of  the  errors 
in  quantities  and  weights,  M/  =  (1  +  u  +  ut)  M«,  where  S ut  —  o,  so 
that  1  -\-u  measures  the  mean  rate  of  growth  of  the  quantities, 
and  cru  is  the  standard  deviation  of  u  and  measures  the  scattering 
of  the  rates  of  growth. 

If  then  the  errors  in  the  quantities  tend  to  be  the  same  at 
both  periods,  the  first  term  in  the  bracket  {  }  in  (77)  is  small, 

and  if  the  quantities  grow  at  nearly  the  same  rate  the  second 
term  is  small.  In  any  case  the  standard  deviation  diminishes 

with  -5=. 

Vn 

Under  conditions  which  are  often  fulfilled  it  follows  that 
very  great  accuracy  can  be  obtained  in  the  ratio  of  weighted 
averages,  though  the  original  errors  in  the  measurement  of 
quantities  and  in  the  systems  of  weighting  are  considerable. 
It  is  important  not  to  vary  the  methods  of  computation,  so 
as  to  obtain  similar  errors  and  a  small  value  of  <rd. 

Example. 

Data  for  Estimating  the  Change  in  Average  Weekly  Wages  in  Certain 

Industries  in  the  United  Kingdom. 


1880.  1900. 


Numbers 
•  W. 

Wages 

M. 

Numbers 

W'. 

Wages 

M'. 

Ratio  of  in¬ 
crease  of  M 
I  *-f*  H  -|-  U  • 

Agriculture  : 

oooo’s. 

shillings. 

oooo’s. 

shillings. 

England  and  Wales 

135 

15 

120 

16-2 

i- 08 

ig  Scotland  . 

;  y  Ireland  . 

24 

18 

20 

21-2 

1*18 

98 

9 

86 

10-4 

1-16 

Building 

.  .  84 

27 

123 

31'0 

115 

Printing 

8 

3i 

13 

32-9 

I- 06 

Shipbuilding 

7 

28-5 

13 

34’8 

1*22 

Engineering 

72 

25 

106 

3°’5 

1*22 

Coal  .... 

44 

23 

75 

34’3 

1-49 

Puddling 

9 

3i 

11 

38-1 

1*23 

Cotton 

52 

16 

54 

i9’5 

1-22 

Wool  and  worsted 

12 

14 

12 

13-6 

•97 

Worsted 

12 

14 

12 

14-4 

i- 03 

Gas  .... 

3 

27 

8 

31-0 

I’I5 

Furniture  . 

12 

23 

18 

24-8 

l-o8 

572  671 
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The  numbers  are  of  all  engaged  in  the  industries  from  the 
General  Report  of  the  Census  of  England  and  Wales,  Table  35. 
The  rates  of  increase  are  from  Mr.  G.  H.  Wood’s  paper  in  the 
Statistical  Journal ,  1909,  p.  93.  The  average  wages  are 
computed  from  various  sources  ;  the  accuracy  of  the  ratios 
is  more  important  than  the  accuracy  of  M. 


n 


14,  m  —  21-54,  m'  =  25-20,  mw  —  18-69,  m'w  =  24-09, 


m  &W 

^_'3  9’  W"'35  '  w 


I -00,  —  -90,  U  —  -160,  cru  =  -12, 

w 


r=—r 42,  r21=—- 44,  r12=  —  -42,  R22  =*25. 

rn 

Ratio  of  unweighted  averages  =  ^  =  1-170  ;  of  weighted 

m  10  co 

averages  =  ==—  =  1-288. 
mw 

Mr.  Wood  gives  -^=  1-163  and  -V°#  =  1-219  for  these,  using 
different  weights. 

s-'2 = ^(x  +  I'°°2)(‘r'i2  +  G+b)  ^ ^ ) 

=  -1430-/  +  -0015  (o-2  +  <r'2), 

by  the  approximate  formula  (77),  and  by  the  full  formula  (148), 

App., 

s,2  =  -1450-/  -f-  -022<r2  +  *oo350^,2  T*  *oi6crG?2, 

where  0+  measures  the  difference  between  the  errors  in  the  weights 
at  the  two  dates. 


The  approximate  formula  fails  to  do  justice  to  the  error  in 
quantities  owing  to  the  great  change  in  the  weights  in  the 
period  whose  effect  is  ignored. 

To  see  the  effect  of  these  errors,  suppose  the  error  in  the 
wages  in  1880  (o-)  is  and  in  the  weights  ( a  )  is  and  that 
similarity  of  error  makes  crd  =  |<x  and  aj  —\orr. 

Then  sv2=-oooo9i +-000055 +-000035 +-000040 =-00022. 

sr  =  *015* 

The  ratio  of  the  averages  may  be  written 

2t^(i-±  (j  )  =  1-288  dh  -020 
mw 

i.e.  the  percentage  increase  instead  of  being  29  may  be  any¬ 
where  from  27  to  31. 
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Actually  the  elemental  errors  may  be  larger  than  those 
here  supposed.  These  figures  are  given  as  an  example  of 
method  and  to  show  the  influence  of  the  various  terms  ;  but 
n  =  14  is  too  small  for  the  theory  to  be  closely  applicable, 
and  a  serious  study  of  general  wage-changes  would  need  a 
wider  range  of  industries  and  more  exact  determination  of  the 
numbers  and  average  wages. 


Significance  of  Differences  Between  Averages. 

A  very  important  problem  that  frequently  arises  in  practical 
statistics  is  to  determine  whether  the  difference  found  between 
two  averages  of  similar  classes  or  groups  could  be  due  to  the 
error  incident  to  observation  (especially  to  the  inclusion  of 
two  small  a  number  in  a  random  sample)  or  can  safely  be 
attributed  to  real  differences  of  characteristics.  E.g.,  if  the 
observed  death-rates  of  two  classes  are  14*7  and  14-3  per  1000, 
are  we  justified  in  saying  that  the  death-rate  of  the  first  class 
is  the  higher,  or  should  we  expect  a  difference  oi  -4  if  we  simply 
separated  two  parts  of  the  population  arbitrarily  ? 

If  the  observed  difference  is  greater  than  is  to  be  expected 
in  chance  selection,  it  is  said  to  be  significant,  i.e.  significant 
of  a  real  difference  between  the  phenomena. 

The  general  method  of  analysis  is  as  follows  :  Suppose  two 
classes  containing  nx  and  n2  things  yield  averages  xx  and  x2. 
Calculate  the  standard  deviation  of  the  frequency  curve  of 
the  differences  between  the  averages  of  n±  things  and  n2 
things  selected  indiscriminately  from  the  whole  universe  from 
which  the  classes  were  segregated,  and  let  this  be  a. 

Compare  xx  ~  x2  with  a.  The  chance  that  the  ratio  is 
greater  than  3  is  -0027,  since  the  sum  of  the  integrals  of 

—  e  *"  dz  from  3  to  00  and  —  3  to  —  00  , 

V  27T 

i.e.  2{i  -  F(3))  =  2(-5  -  -49865)  =  -0027  (p.  271). 

Similarly  the  chances  that  the  ratio  is  greater  than  2  or  1 
are  -0456  or  *3174  ;  and  it  is  just  as  likely  as  not  that  the 
ratio  is  as  great  as  *674.  If,  then,  x1  ~  x2  is  not  greater  than 
•6740-,  there  is  no  evidence  of  a  real  difference,  that  is,  a 
difference  due  to  the  nature  of  the  classes  and  not  attributable 
to  chance  deviation.  As  x1^x2  increases  beyond  this,  the 
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improbability  of  the  result  as  a  chance  event  increases,  till 
when  the  ratio  equals  2  the  odds  are  about  21  to  1  (-9544  to 
•0456)  against.  At  2a  we  may  say  that  the  event  is  improbable 
unless  the  difference  is  real.  At  3 a  the  odds  against  are  about 
370  to  1,  and  this  is  generally  regarded  as  so  improbable  that 
the  difference  x±~ is  spoken  of  as  significant.  At  4 a  the 
odds  against  are  about  15,000  to  1.  We  can,  of  course,  never 
arrive  at  certainty  by  this  method  ;  we  have  rather  to  connect 
the  word  significant  with  the  scale  of  probability.  In  the 
following  paragraphs  rules  are  given  for  calculating  a  ;  in 
every  case  the  frequency  group  of  the  errors  is  normal,  since 
the  conditions  described  in  previous  sections  are  satisfied,  and 
in  every  case  the  connection  between  a  and  the  probability 
of  chance  occurrence  is  that  described  in  this  paragraph. 


A. — Cases  of  the  Proportion  of  Things  with  Particular 
Characteristics  in  a  Universe. 

1.  Let  N  be  the  number  of  things  in  a  universe,  of  which  pN 
have  a  particular  characteristic  where p  and  N  are  known.  q=i—p. 

Let  n  be  selected  at  random,  and  p'n  be  found  to  have  the 
characteristic. 

Then  <r  for  p'  ~  p  is  y/ pq  Q  -  ~  )  or  if  jt-  is  negligible. 


Example. — If  dice  are  thrown  1200  times  and  6  turns  up  180 

times,  N  =  00  ,p  —  \,  n  —  1200,  p'  —  — .  cr=x/ .  § .  — W-oioS 

6  20  V  Vo  6  1200/ 

P-P'  =  -o  167  =  r 
o-  *0108  ’ 

and  there  is  an  indication  but  no  proof  that  the  dice  are  not  uniform 
in  respect  of  their  6  faces. 

2.  Let  two  samples  [nv  pP)  (n2,  p2)  be  selected  from  the  universe, 

1  1  x  nl  n2 

and  neglect 

The  standard  deviations  of  px  —  p  and  p2  —  p  are 

Jn  and  Jn 

v  nx  v 

Hence  the  standard  deviation  of  px  ~  p2  —  (Pi  —  P)  ~  {P2  —  p 
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is  the  square  root  of  the  sum  of  the  squares  of  their  separate 
standard  deviations  (formula  (34))  and 


If  p  is  not  known,  but  can  only  be  deduced  from  the  samples, 
the  best  value  to  take  seems  to  be  that  found  by  merging  the 
samples,  viz. :  p  (n1  +  n2)  =  pxnx  +  p2n2. 

Example. — In  1000  houses  selected  in  a  town,  in  200  (nx)  the 
head  of  the  household  is  an  artisan,  in  800  (n2)  a  labourer. 
Children  of  school  age  are  present  in  80  of  the  first  group  ( p1  —  -4) 
and  420  of  the  second  (p2  —  *525).  (The  numbers  are  hypothetical.) 

p  X  1000  =  80  +  420,  .*.  p  =  \  —  q 


a-  for  p1  ~  p2 

P2  Pi 


+ 


V  4V200  800 

•525  —  *4 


=  *04 


<T 


•04 


=  3  approx. 


The  difference  is  significant. 

3.  The  samples  ( n1pl ),  ( n2p2 )  are  selected  from  different  unknown 
universes,  nx  and  n2  being  large. 

E.g.,  suppose  that  out  of  1000  men  selected  from  two  countries, 
300  and  250  respectively  are  found  to  have  blue  eyes. 

Here  p,  —  —  in  the  selection,  with  crx  —  v  /  - — ~  =  *014,  and 
r  1  10  N  io5 

the  value  for  the  whole  country  (if  the  selection  had  nothing  to  do 

with  race  or  climate  within  the  country)  is  approximately  px. 

Similarly  in  the  other  country  it  is  approximately  p2  =  J. 

The  standard  deviation  for  px  ~  p2  is  that  for  the  difference 

between  two  independent  groups,  viz. : 


Pi-P  2=:-3--25_ 

c r  *02  “  ° 


This  method  is  generally  used  when  the  death-rates  of  two 

occupational  classes  (e.g.,  miners  and  bricklayers)  are  compared. 

If  of  nv  n2  under  observation  m1  and  m2  die  in  a  year,  the 

,  .  m,  m2  ,  ,  mx  m2 

rates  (rv  r2)  are  — 1  X  1000,  -  X  1000,  and  px  =  p2  =  -f, 

tl-t  11 2  ft'i  ^2 

since  in  the  absence  of  other  evidence  it  is  assumed  that  the  risk  is 
the  same  throughout  each  class. 
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The  miners  are  then  assumed  to  be  a  random  sample  of  a 
universe  of  miners,  and  similarly  with  the  bricklayers. 

Then  o-  for  r1  —  r2 


—  1000 


PlQl  _|_  p2$2 

nY  '  n2 


—  1000 


A  simpler  procedure,  however,  is  to  compare  each  class 
with  the  adult  male  population  as  a  whole.  Then  to  find  if 
the  miners’  death-rate  differs  from  that  of  occupations  in 
general  we  should  use  Case  i. 

In  the  preceding  it  has  been  assumed  that  the  chance  ft 
was  the  same  throughout  the  universe.  It  may  happen, 
however,  that  the  universe  consists  of  different  regions  or 
strata  in  which  the  chances  are  different,  and  the  question 
arises  whether  we  should  proceed  at  random  in  the  selection 
of  a  sample  out  of  the  universe  as  a  whole,  or  whether  we 
should  partially  arrange  the  choice  so  as  to  take  the  same  pro¬ 
portion  out  of  each  region  or  stratum.  Mr.  Yule  ( Theory  of 
Statistics,  p.  281)  gives  a  formula  which  may  be  established 
as  follows. 


Let  a  universe  contain  nv  n2 .  .  .  nt  things  in  t  strata,  and  let  the 
numbers  which  have  a  certain  characteristic  be  ft 1n1,  ft2n2  .  .  .  fttnt 
in  these  strata. 


N  =  %  +  n2  -f  .  .  .  +  nt,  and  let  ftxnx  +  ft2n2  +  .  .  .  ==  PN. 

Let  knlt  kn2  .  .  .  knt  be  examined  in  the  t  strata,  i.e.  &N  =  n 
in  all. 


Write  ftx  —  P  +  dlt  ft2  =  P  -f-  d2  .  .  . , 


11-t  ft 

where  P  =  ftx  ^  +  ft2  +  .  .  .,  so  that  S (nd)  =  0. 
The  standard  deviation  of  ftr  in  the  sample  is 


and  similarly  for  ft2  etc. 

Hence  if  cr  is  the  standard  deviation  for  P  in  the  sample, 
by  formula  (55). 


I 

M2 


("1  Pi<h  +  +  •  •  .) 
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+  •  •  • 

-  S  (np)  -  S(#2)  =  NP  -  S{n(P  +  d)*\ 

=  NP  -  NP2  -  2P  .  S nd  -  S nd*  -  NPQ  -  Nay2 

where  <rp2  =  i  (nxd-f  +  n2d2 2  +  .  .  .) 


n 


CTp 2 
« 


(8o) 


Here  o-  is  the  standard  deviation  for  the  observed  result,  P  is 
the  actual  proportion  in  the  universe,  and  <tp 2  is  the  weighted  mean 
square  of  the  deviations  in  the  strata. 

If  we  took  the  numbers  at  random  through  the  universe  the 

PQ 

standard  deviation  of  the  error  would  be  cr0,  where  o-n2  =  —  . 

n 

0-2 

Hence  o-2  =  o-02 - — ,  and  by  choosing  proportionally  from  the 

n 

various  strata  the  standard  deviation  of  the  error  involved  is 
diminished. 


In  the  investigation  of  the  economic  conditions  of  4  towns 
(. Livelihood  and  Poverty),  instead  of  numbering  all  the  houses 
and  selecting  1  in  20  at  random,  we  marked  one  out  of  every  20 
throughout  each  street.  By  this  means  we  secured  that  no 
district  was  completely  unrepresented,  which  may  possibly 
happen  in  a  random  selection,  and  we  also  got  the  advantage 
indicated  by  the  formula  just  given,  since  social  conditions  in 
a  street  have  a  certain  similarity.  Suppose  that  there  were 
16,000  houses  in  10  equal  wards,  and  that  in  these  wards  the 
proportions  below  some  assigned  standard  were  *02,  *06,  -io 
.  .  .  *38.  Then  N  =  16000  ;  nx  =  n2  —  .  .  .  =  1600  ;  p1  —  • 02 , 
p2  =  -06  .  .  .  P  =  *2 ;  dx  =  —  -i8,  d2  =  —  -14  .  .  .  ; 

<rP2  =  tV  (*i82  4-  *i42  +...),  <rP  =  ‘115- 


Now  suppose  80  houses  were  examined  in  each  ward, 

n  —  800,  k  =  TV>  cr2=  a  =  ’oi36’  an<^  resu^ 

may  be  written  -20  ±  *0136,  or  20  ±  1-36  per  cent. 

In  a  non-stratified  selection  we  should  have  had  a  =  -0141. 
The  gain  in  precision  is  very  slight,  but  the  method  of  selection 
by  strata  is  in  accordance  with  common  sense  and  should  be 
used  where  it  is  applicable. 
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B. — Case  of  a  Universe  Containing  a  Number  of 

Measurable  Objects. 

i.  Let  there  be  N  objects  in  the  universe,  the  average  of  whose 
measurements  is  x  and  standard  deviation  s. 

n  are  selected  at  random  and  their  average  is  found  to  be  xv 

Then  the  standard  deviation,  V,  of  x1  ~  x  is  s  v/£_  L,  by 
formula  (52). 

Example. — -The  average  number  of  persons  per  tenement  in  a 
town  of  10,000  tenements  is  4-5,  with  standard  deviation  2. 

In  1000  working-class  tenements  the  average  is  4*7. 

Here  N  =  10,000,  n  —  1000,  £  =  4*5,  ^  =  47,  s  =  2. 


1 


a 


IOOO  10000 


2.  The  universe  is  only  known  by  a  sample,  n,  x,  s.  A  sub¬ 
sample  of  n1  gives  xv  av 

Let  n2,  x2,  cr 2  be  the  residue,  which,  if  the  first  sample  were 
random  and  not  of  a  class  with  a  special  average,  would  also  be  an 
independent  random  sample  from  the  unknown  universe. 

Then  7%  -f ~  n2  —  n,  nlx1  +  n2x2  =  nx. 


/y  _____  y  /y*  _____  .v-1  A''  _____  /y> 

i-V  j  t-V  2  i-v  J  1-  v  i/v  1  v  £ 


as  can  be  shown  by  eliminating  cr2  and  x2. 

Let  nx  be  less  than  n2,  and  n  and  n2  great. 

Then  (x  —  xt)  Vn±  is  of  order  a-V nv  i.e.  of 


i.e.  of  (rv  Hence  the  term 


(x  —  xx)2  is  negligible. 


(81) 


See  Biometrika,  Vol.  V.,  p.  182. 
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This  method  is  used  in  the  Scotch  Census  (Cd.  7163,  p.  288) 
for  comparing  the  size  of  families  of  men  in  different  occupations. 
/yv  & 

If  —  is  small  a  =  — as  it  should  from  Case  i  when 
n 

ft 

^  is  neglected,  and  the  observed  04  is  taken  for  the  unknown  s. 

If  (j1  =  s,  as  will  be  the  case  if  the  standard  deviation  is 
not  affected  by  class,  but  only  the  average  affected, 

a  =  cr1  ^  as  was  to  be  expected. 


Example  from  the  Scotch  Census. 
n,  total  number  of  marriages,  =  133,960. 

x,  average  number  of  children  per  marriage,  =  5-82,  with 
s  =  3-099. 

Among  boiler-makers,  nl  —  923,  xx  —  6-oo,  —  3*039. 


<X 


9-24  ^  9-60  —  18-46  j  _ 


4. 

923  13396° 


TO. 


<T 


To 

— —  =  i-8,  which  is  barely  significant 


Example. — Among  the  flour  prices  tabulated  on  p.  311  for  U.S.A., 
142  came  from  North  Atlantic  States. 

Number.  Average.  Standard  Deviation. 

U.S.A.  .  .  .  n  =  26y  ^  =  2-625  s  =  -293 

North  Atlantic  .  nx  =  142  xx  =  2-748  04  =  -244 


or  = 


•0858  —  -ii90\ 
267  / 


•017. 


— — —  =  =  7  approx.,  and  the  price  in  the  North  Atlantic 

States  was  definitely  higher  than  the  average  for  the  whole  country. 

3.  Two  samples  and  ( n2x2o-2 )  are  taken  out  of  two 

known  universes  (N-pCsJ  and  (N^'sg). 

a  for  xx  ~  x2  is  then 


since  we  have  the  difference  between  two  independent  observations, 
each  coming  under  Case  1. 
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y\ 

If  the  universes  are  only  known  from  samples,  and  J  -xt-  are 

IN  2 

small  we  must  take 


<r*=  ^L  + 


Wi 


n. 


. (82) 

H  '"2 

There  are  •  some  other  variants,  which  can  be  treated  on  the 
same  principles. 

Example. — In  food  expenditures,  similar  to  those  tabulated  on 
p.  310,  in  the  whole  group  we  have  x  =  10-3  (shillings),  s  =  3*3. 

xx  for  the  families  of  566  skilled  workmen  was  10*9,  and  x2  for 
the  families  of  266  unskilled  was  9-3. 

The  standard  deviations  for  these  groups  were  not  calculated, 
but  were  probably  nearly  the  same  as  s. 

<j  for  xx  x2  is  3*3  V^innr  2m)  *~5* 


x1  —  x2 


i-6 


=  6  approx.,  and  the  difference  is  significant. 


O-  .25 

The  stratification  of  a  universe  of  measurable  objects  is 
also  treated  by  Mr.  Yule  ( Theory ,  p.  345). 

Let  a  universe  (N xs)  be  composed  of  groups  (n2x2s2) 

and  let  knv  kn2  ...  be  selected  from  the  groups,  and  the  averages 
be  found  to  be  (xx  +  Sj),  (x2  -}-  S2) . .  . ,  and  the  average  of  the  &N 
to  be  x  +  E> ;  kN  ==  n. 

Then  N  =  %  +  n2  +  . . . ;  NT  =  n1x1  4-  n2x2  4-  . . . 

Write  xx  =  x  -f-  dv  x2  =  x  +  d2  .  .  .  ; 
then  S  nd  —  o 

Ns2  =  nx  (sx2  +  dx2)  +  n2  (s22  -f  d22)  +  .  .  . 

The  squares  of  the  standard  deviations  for  Sx>  S2  .  .  .  are 


knx  kn2 ' 

Write  o-  for  the  standard  deviation  of  D. 

(x  +  D)  =  knx  (xx  +  Sx)  +  kn2  (x2  +  S.2)  + 

n,  „  .  n2 

N 

2 


D  =  ~  ^  S2  +  •  •  • 


a 


2  _ 


'W,  \“  S, 


+ 


(*h 


2  o  2 
\  b2 


by  formula  (55). 


N/  knx  \N/  kn2 


kN2 


2  (%h2  4-  w/22  +  •  •  •)• 


Write  (j 0  for  the  standard  deviation  of  the  average  if  the  n 
samples  had  been  taken  at  random  from  the  universe  as  a  whole. 


APPLICATIONS  OF  THE  LAW  OF  ERROR 


337 


Then 


< Tc 


n 


{»i(si2  +  d !2)  +  «2  (s22  +  d22)  +  . . .} 


i  S  (nd2) 
n  *  ""N“ 


Write  N<rm2  =  S  (nd2),  so  that  o-m2  is  the  weighted  mean  square 
of  the  deviations  of  the  averages  in  the  strata. 


Then 


q~m2 

n 


•  (83) 


The  precision  of  the  average  is  improved  by  stratification,  as  in 
the  previous  case  (formula  (80)). 

Thus  in  the  example  on  p.  313,  let  N  be  the  total  number 
of  tenements  in  the  last  seven  districts  named  (Spitalfields 
and  onwards),  x,  the  average  number  of  persons  per  tenement 
in  the  districts  combined,  is  found  from  the  Census  to  be 
4-64,  with  s  =  2*75.  k  =  since  one  tenement  in  50  was 
recorded  in  the  selection  ;  N  =  57000,  and  n  =  1140. 


Number  of 

Persons  per 

tenements. 

tenement. 

Spitalfields  . 

nl=  6500 

+  •15  — dj 

Whitechapel 

«*=  59 

+  -o8  =  d2 

St.  George  . 

**3=  94 

+  -24  =  ^3 

Shadwell 

«4=  48 

-•2  7  =  d, 

Limehouse  . 

ii5=  66 

—  •10  =  dh 

Mile  End,  S.W.  . 

134 

+  •07  =  ^6 

„  N.E.  . 

n7=  104 

“•24  =  d- 

570 

1806 

S  {nd2)  =  1806 

I’m2  = 

- -  -0317 

57°°° 

0  (2'75)2 

<r#‘“  1140  = 

•OO6634 

o-0=  -08145 

_  .006634  ~ 

•0317 

=  -006606  a 

—  -08128 

II40 

The  improvement  obtained  by  sampling  in  the  seven  strata 
represented  by  the  districts  is  very  slight. 


Existence  of  a  Trend. 

Further  applications  of  the  same  principles  are  made  when 
we  consider  a  time-series  of  observations  and  examine  whether 
the  fluctuations  and  movements  are  random  or  show  the 
existence  of  a  trend  or  of  periodicity. 
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The  method,  and  its  difficulties,  can  be  shown  sufficiently 
by  two  examples. 


i. — The  Recorded  Times  for  “  The  Oaks  ”  from  1850  to  1899  are  as  Shown  Below 


1850 

min 

2 

sec. 

56 

i860 

min.  sec. 

2  56 

1870 

min 

2 

sec. 

52 

1880 

min.  sec. 

2  49 

1890 

min.  sec. 

2  4<4 

1851 

2 

52 

1861 

2 

44 

1871 

2 

5i 

1881 

2 

46 

1891 

2 

54  ! 

1852 

3 

0 

1862 

2 

49 

1872 

2 

52 

1882 

2 

49 

1892 

2 

43  i 

1853 

2 

52 

1863 

2 

54 

1873 

2 

5  of 

1883 

2 

53 

1893 

2 

44* 

1854 

3 

0 

1864 

2 

47 

1874 

2 

AH 

1884 

2 

49 

1894 

2 

50 

1855 

2 

58 

1865 

2 

5i 

1875 

2 

49  h 

1885 

2 

43  f 

1895 

2 

48? 

1856 

3 

4 

1866 

2 

53 

1876 

2 

5o 

1886 

2 

54  f 

1896 

2 

45?; 

1857 

2 

50 

1867 

2 

54 

1877 

2 

54^ 

1887 

2 

5of 

1897 

2 

45 

1858 

2 

53i 

1868 

2 

4  7h 

1878 

2 

54 

1888 

2 

42* 

1898 

2 

45  f 

1859 

2 

55 

1869 

2 

59 

1879 

3 

2 

1889 

2 

45 

1899 

2 

44 

Ten  yearly 
average 

2 

56-05 

2 

51-45 

2 

52-395 

2 

48-22 

2 

46-26 

These  figures  fit  fairly  well  a  normal  curve  with  average 
2  min.  50-87  secs,  and  standard  deviation  5-20  secs.  The 
standard  deviation  for  the  difference  between  two  records  is 
therefore  5-2  V 2  =  7-4  secs.  This  is  only  exceeded  eleven  times 
between  consecutive  years,  and  no  difference  between  consecu¬ 
tive  years  reaches  twice  this  ;  hence  there  is  no  proof  of  any 
sudden  change  having  taken  place  between  two  races.  The 
difference  between  some  of  the  times  for  years  early  in  the 
period  and  those  later  in  some  cases  exceeds  20  seconds.  The 
standard  deviation  for  the  difference  between  the  averages  for 


two  periods  of  ten  years  is  5-2 


=  2-33  secs. 


The 


difference  between  the  averages  for  1850-9  and  1890-9  is  nearly 
10  seconds,  and  is  significant,  as  is  the  difference  between  the 
averages  for  1850-9  and  1880-9.  The  intermediate  differences 
are  hardly  significant.  Hence  we  find  that  some  cause  was  at 
work  which  gradually  quickened  the  race  between  the  fifties  and 
the  eighties. 


2. — The  Marriage  Rates  for  England  and  Wales  from  i860  to  1909  were 


i860 

17-1 

1870 

i6-i 

1880 

14-9 

1890 

I5*5 

1900 

16-0 

l86l 

16-3 

1871 

16-7 

1881 

I5’1 

1891 

15-6 

1901 

15-9 

1862 

i6-i 

1872 

17-4 

1882 

15-5 

1892 

15-4 

1902 

15-9 

1863 

16-8 

1873 

17-6 

1883 

15*5 

1893 

14-7 

1903 

15-7 

1864 

17-2 

1874 

17-0 

1884 

I5*i 

1894 

15-0 

1904 

15-3 

1865 

17-5 

1875 

16-7 

1885 

14*5 

1895 

15-0 

1905 

15-3 

1866 

I7'5 

1876 

16-5 

1886 

14-2 

1896 

15*7 

1906 

15-7 

1867 

16-5 

1877 

I5‘7 

1887 

14-4 

1897 

16-0 

1907 

15-9 

1868 

i6-i 

1878 

15-2 

1888 

14-4 

1898 

16-2 

1908 

I5"1 

1869 

15-9 

1879 

14-4 

1889 

15-0 

1899 

16-5 

1909 

14-7 

Ten  yearly 

average 

16-70 

16-33 

14-86 

I5-56 

15-55 
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The  average  for  50  years  is  15*80,  and  the  standard  devia¬ 
tion  for  the  50  records  is  *89,  and,  taken  irrespective  of  order, 
the  distribution  is  nearly  normal.  There  are  no  sudden  jumps 
from  one  year  to  the  next.  The  standard  deviation  for  the 
difference  between  two  averages  of  ten  years  is  -4,  and  hence 
the  fall  from  1870-9  to  1880-9  and  the  subsequent  rise  is 
significant. 

The  first  twenty-live  years  shows  greater  variation  than  the 
second  twenty-five,  and  we  can  make  a  finer  test. 


1860-1884. 

cr  =  -894  oVf=*566. 

Average. 

1885-1909. 

cr  =  -6i3  crV^J 
Average. 

1860-4 

16-70 

1885-9 

14-50 

1865-9 

16-70 

1890-4 

15-24 

1870-4 

16-96 

1895-9 

15-88 

1875-9 

15-70 

I9OO-4 

15-76 

1880-4 

15-22 

1905-9 

15-34 

There  is  a  significant  fall  from  1870-4  to  1885-9  and  a 
significant  rise  from  1885-9  to  1895-9. 

The  argument  should  be  illustrated  by  a  diagram,  which 
will  suggest  to  what  periods  the  test  should  be  applied. 


Periodicity. 


The  general  question  of  the  existence  of  a  period  of  a  length 
not  predetermined  is  a  mathematical  problem,  that  of 
harmonic  analysis,  and  is  not  suitable  for  discussion  here  ;  but 
we  can  test  the  influence  of  periodicity  if  the  length  of  the 
period  is  given. 

Take  the  case  of  a  given  interval,  say  one  year  where  the 
records  are  monthly,  and  consider  whether  the  differences 
between,  say,  January  and  February  are  such  as  might  occur 
in  a  random  choice  of  observations  irrespective  of  time.  Suppose 
the  records  extend  over  t  years,  so  that  there  are  12  X  t  in 
all,  that  their  average  is  x  and  their  standard  deviation  from 

the  average  a  —  / 

vation. 

The  standard  deviation  of  the  difference  between  two 
averages  each  of  t  records  selected  at  random  is 


— where  a  stands  for  any  obser- 


z*  2 
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and  the  chance  of  exceeding  this  deviation  is  found  from  the 
table  of  normal  probability  (p.  271),  if  the  records  regarded 
as  a  group  are  nearly  normal,  or  otherwise  satisfy  the  condi¬ 
tions  of  p.  299.  If  instead  of  taking  random  selections  we 
compare  the  average  of  the  t  January  records  with  that  of  the 
February  records  and  find  that  the  difference  exceeds  twice 


or  three  times  a 


and  similarly  for  other  months,  then  we 


have  evidence  that  the  quantities  measured  are  affected  by 
the  time  of  year,  unless  the  records  of  a  month  include  some 
quite  abnormal  entry. 

It  is  not,  however,  easy  in  this  method  to  include  all  the 
evidence.  Thus  if  we  take  the  180  records  of  unemployment 
on  p.  161  we  find  that  the  average  is  4*269  and  the  standard 
deviation  is  1*924.  The  standard  deviation  for  the  difference 


between  two  averages  of  15  is  therefore  1*924  —  *7°-  This 

is  exceeded,  but  not  greatly,  when  we  compare  the  averages  for 
January  or  December  with  those  for  April,  May,  June  or  July, 
and  there  is  no  other  difference  which  might  not  arise  in 
random  selection.  There  is,  however,  cumulative  evidence 
which  can  hardly  be  measured.  Thus  the  averages  fall  from 
December  through  January,  February,  March  (if  we  omit 
the  abnormal  entry  in  1912),  and  April  to  May,  and  rise  month 
by  month  from  May  to  October.  This  suggests  a  wave  motion 
which  the  method  here  suggested  is  incapable  of  measuring. 

Another  method,  also  difficult  to  make  precise,  is  to  compare 
the  numbers  of  falls  and  rises  from  an  assigned  month  to  the 


next. 

Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

Feb. 

Mar. 

Apr. 

May. 

June. 

July. 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Jan. 

Falls 

.  12 

13 

IO 

IO 

5h 

7 

2 

7\ 

8* 

IO 

15 

IO 

Rises 

•  3 

2 

5 

5 

9i 

8 

13 

7* 

6£ 

5 

O 

4 

Thus  in  12  years  the  February  number  was  less  than  that 
for  the  preceding  January,  and  in  3  years  it  was  greater. 
Where  the  numbers  are  equal,  |  is  counted  for  each  row. 
Now  in  15  trials  in  each  of  which  +  and  —  are  equally  likely, 
the  chance  of  obtaining  10  or  more  of  like  sign  is  about  J, 
so  that  the  movements  March  to  April,  April  to  May,  October 
to  November  are  not  very  improbable  in  a  random  selection. 
The  chance  of  obtaining  12  or  more  of  the  same  sign  is  only 
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and  the  movements  from  January  to  February,  February  to 
March,  July  to  August,  November  to  December,  would  hardly 
occur,  if  there  were  no  influence  from  the  season. 

The  conclusion  seems  to  be  that  there  is  a  cumulative 
decrease  from  November  to  March  or  April  or  May,  and  a 
cumulative  rise  during  the  early  summer. 

Another  example  gives  more  definite  results.  The  records 
of  the  catch  of  haddocks  are  recorded  [North  Sea  Fisheries 
Investigation ,  Grant  on)  monthly  for  18  years,  the  unit  being 
1  cwt.  per  month  per  vessel.  The  average  is  172  and  the 
standard  deviation  of  the  216  records  is  about  108. 

The  standard  deviation  for  the  difference  between  the 
average  of  one  month  compared  with  the  average  of  all  is 

108  ^j{~ g  +  =  26-5  approx.,  and  for  the  difference  between 

the  averages  of  two  months  is  108  =36,  in  both  cases  if 

the  selections  were  random  and  there  were  no  seasonal 
influence. 

The  averages  recorded  are  : — 


January 

.  IOI 

April  . 

•  83 

July 

247 

October 

.  227 

February 

•  115 

May 

•  145 

August 

282 

November 

.  181 

March  . 

.  125 

J  une  . 

.  196 

September  . 

267 

December 
Year  . 

.  IOI 

.  172 

Here  January,  February,  April  and  December  are  more 
than  twice  27  below  the  average,  March  and  May  are  not  less 
than  27  below  the  average,  each  month  from  July  to  October 
is  more  than  twice  27  above  the  average,  June  and  November 
are  within  27  of  the  average.  The  conclusion  is  definitely 
that  the  season  July  to  October  is  better  than  the  season 
December  to  April. 

Also  the  movements  between  consecutive  months  are  more 
than  36  in  the  following  cases  :  March  to  April,  April  to  May, 
May  to  June,  June  to  July,  September  to  October,  October  to 
November,  and  November  to  December.  April  is  clearly  the 
worst  month,  but  it  is  doubtful  whether  August  is  established 
as  the  best. 

From  the  original  figures  we  have 


Jan. 

Feb. 

Mar. 

Apr. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Number 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

to 

of 

Feb. 

Mar. 

Apr. 

May. 

J  une. 

July. 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

Jan. 

Falls 

•  54 

IO 

16 

4 

4 

7 

4 

II 

1 1 

1 1 

15 

8 

Rises 

.  1 2  £ 

8 

2 

14 

M 

11 

M 

7 

7 

7 

3 

9 
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Here  a  number  greater  than  n  or  less  than  7  is  likely  to  be 
significant. 


Notes. 


1.  The  standard  deviation  of  an  average  is  often  given  as-—  . - 

v  n—  1 


instead 


of  — as  on  p.  289,  on  the  ground  that  we  should  distinguish  between  the 
V  n 

deviation  from  the  unknown  true  average  and  that  from  the  average  of 
observations. 

Let  xQ  be  the  true  average  of  a  group  whose  standard  deviation  is  <r0,  and 
let  n  things  be  selected  from  it  which  give  an  average  x,  and  which  separately 
are  Xx,  X2 .  .  .  with  standard  deviation  <r. 

Write  x  =  x0  -f-  d. 

The  deviations  of  X1(  X2 .  . .  from  the  true  average  are  Xx  —  x0,  X2  —  x0 . . . , 
and  the  standard  deviation  of  these  is  by  hypothesis  <r0. 

Hence  <r02  =  Mean  (X  —  x0) 2  —  Mean  (X  —  x  -f-  d) 2  =  Mean  (X  —  x) 2  -f  d* 


,  since  from  formula  (38)  the  standard  deviation  of  the 


average  is  a 


and 


^0  _  o'  /  |  S  (X  —x)1  ( 

Vw  Vn-i  V  l  n(n  —  1)  J 


Hence  the  observed  <r  should  be  divided  by  Vm-i,  not  V n. 

The  modification  is  only  of  theoretic  importance,  for  the  difference  is  only 
perceptible  with  quite  small  values  of  n,  and  <r  is  liable  to  an  error  of  the 
same  order  as  this  difference  in  any  case. 

2.  When  as  on  p.  159  and  pp.  375,  387  we  measure  the  deviation  of  an  obser¬ 
vation  in  a  time  series  from  the  average  of  t  years  of  which  it  is  the  centre,  we 
ought  to  pay  attention  to  the  risk  of  error  due  to  averaging,  measured  by 


<r/V t,  where  a  is  the  standard  deviation  of  the  observations  in  neighbouring 
years.  The  standard  deviation  of  the  difference  between  an  observation  and 

such  an  average  is  not  <r  but  (c*+’L')=„  v/' 


t+  I 


Since  t  is  small,  the 


error  is  perceptible,  and  the  deviations  as  shown  on  such  a  diagram  as 
that  facing  p.  155  are  imperfectly  estimated,  and  the  measurement  of 
correlation  on  pp.  386-7  lacks  precision. 


CHAPTER  V. 


EMPIRICAL  FREQUENCY  EQUATIONS. 

It  cannot  be  assumed  that  frequency  groups  in  general  are 
expressible  by  the  law  of  great  numbers,  for  the  particular 
complex  of  independent  causes  which  leads  to  its  equation 
cannot  be  postulated  for  observational  groups  in  general.  The 
main  use  of  the  normal  curve  is  in  its  application  to  averages 
or  other  functions  whose  method  of  generation  are  known. 
Its  applicability  to  anthropometrical  or  biometrical  groups 
must  be  verified  for  each  class  of  measurements,  and  the 
question  whether  mental  and  moral  characteristics  are  normally 
distributed  needs  special  investigation.  There  is,  however,  a 
presumption  that  in  very  many  classes  the  normal  distribu¬ 
tion  represents  fairly  the  central  portion  of  a  group  (from  the 
centre  to  once  or  twice  the  standard  deviation)  and  that  the 
chance  of  an  observation  differing  from  the  average  by  more 
than  twice  the  standard  deviation  is  not  large,  and  conse¬ 
quently  the  table  of  normal  frequency  affords  some  guidance 
even  in  non-normal  cases. 

For  complete  description  of  groups  either  a  more  elastic 
system  is  needed  to  include  wider  classes  than  are  covered  by 
the  curve  of  error,  or  equations  on  an  empirical  basis  should 
be  found  to  fit  special  classes  of  observations.  In  this  chapter 
we  deal  very  briefly  with  equations  that  serve  one  or  other  of 
these  purposes. 

The  general  method  is  to  select  a  mathematical  equation 
involving  2,  3  or  4  unknown  constants,  the  constants  being  so 
chosen  as  to  make  the  curve  represented  by  the  equation  fit 
the  diagram  formed  from  the  observations  ;  the  number  of 
points  on  the  diagram  being  more  numerous  than  the  number 
of  constants,  we  obtain  more  equations  than  unknowns  and 
the  best  solution  has  to  be  chosen.  A  usual  way  of  meeting 
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such  a  difficulty  is  by  the  method  of  least  squares  (p.  452)  f 
but  with  observational  frequency  curves  Professor  Pearson's 
method  is  generally  used,  equating  the  moments  deduced  mathe¬ 
matically  from  the  equation  of  the  curve  to  the  moments 
obtained  (as  in  Chap.  I,  p.  253)  from  the  observations.  This 
method  has  already  been  used  (p.  305)  when  the  average, 
standard  deviation,  and  skewness  (x,  a,  k)  have  been  obtained 
from  the  first  three  moments  of  the  observations,  and  it  is 
always  used  in  the  system  described  in  the  next  paragraph. 
Other  methods  are  to  obtain  those  constants  which  satisfy 
the  condition  that  the  observations  would  be  found  in  a  random 
sample  with  minimum  improbability  or  to  select  a  small  number 
of  chosen  points  at  which  the  equations  shall  be  exactly 
satisfied. 


Professor  Karl  Pearson  s  System. 


It  is  necessary  to  call  attention  to  the  system  of  curves 
introduced  by  Professor  Karl  Pearson,  since  the  notation 
involved  has  become  general  in  statistical  investigations,  and 
it  is  advisable  to  indicate  their  relationship  to  the  present 
treatment.  For  a  detailed  treatment,  however,  the  reader  is 
referred  to  Mr.  Elderton’s  book,  Frequency  Curves  and  Correla¬ 
tion ,  and  Mr.  Hardy’s  Theory  of  the  Construction  of  Tables  of 
Mortality. 

/jl0,  nlf  .  .  .  fit  .  .  .  are  used  to  denote  the  successive  moments 
of  a  frequency  curve.  fi0,  the  area,  is  taken  as  unity,  /q  is  zero, 
if  the  curve  is  referred  to  the  ordinate  through  the  centre  of 
gravity  of  the  curve.  When  this  is  the  case,  cr,  the  standard 

deviation,  is  defined  as  V/v  A  is  written  for  ^  and  /32  for 

P  2 


^4 


lL  2 


2* 


The  equation 


D  xy 


(x  -P  a)y 


b0  -f  \x  +  b2x2 


(84) 


is  the  basis  of  the  analysis. 

This  satisfies  the  condition  that  the  curve  should  touch  the 
axis  when  y  **  0  and  also  be  horizontal  at  one  other  position, 
namely  when  %  —  —a.  That  is,  the  curve  has  one  mode. 
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It  is  found  in  practice  that  it  is  useless  to  continue  the 
denominator  after  the  term  b2x2. 

The  integration  of  this  equation  leads  to  the  three  alterna¬ 
tive  general  forms  : 


y  =  y0  1  + 


x  Vftl  / 


x  \  va2 


a , 


U-J  '-V=WI  + 


x 

v2\  -  m  .  -1  - 

X  \  -v  tan  a 

a>)  e 


and 


y  =y0(x  -  a)"'1 


where  y0  and  the  sets  of  three  constants  (v,  av  a2),  (m,  a,  v), 
(a,  q2,  gq),  are  determinable  by  means  of  moments  from 
a,  b0,  bv  b2  in  the  basic  equation.  Mr.  Hardy  gives  an  alter¬ 
native  method  of  analysis  based  on  an  apparently  simpler 
notation. 

When  there  are  special  values  of  a,  b0,  bv  b2  or  special  rela¬ 
tions  between  them,  simpler  equations  involving  only  two 
constants,  or  even  only  one,  are  obtained.  In  all,  seven 
principal  types  are  distinguished,  and  Mr.  Elderton  shows  how 
each  can  be  fitted  to  appropriate  observational  frequency 
groups.  The  algebra  and  the  arithmetic  involved  are  some¬ 
what  heavy.  The  results  of  the  application  of  the  method  to 
food  expenditure  are  given  on  p.  310. 

The  equation  of  the  normal  curve  of  error  can  be  written 
in  the  form 


(85) 


and  is  one  of  the  special  types. 

The  second  approximation  to  the  general  curve  of  error 
gives 


a2  + 


kctX 

2 


(86) 


where  k 2  is  neglected,  and  is  also  a  special  case. 

It  has  been  found,  especially  by  Professor  K.  Pearson  and 
his  co-workers,  that  unimodal  observational  frequency  groups 
can  very  generally  be  represented  adequately  by  one  or  other 
of  the  variants  of  the  formula.  Hence  the  calculation  of  the 
average,  a,  /3lf  and  (32  from  the  observations  forms  a  general 
and  useful  way  of  expressing  a  group  by  four  intelligible 
quantities,  carrying  further  the  process  by  which  an  average 
is  commonly  taken  as  representing  a  group. 
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From  these  quantities  the  equation  of  the  curve  repre¬ 
senting  the  group  can  be  deduced  in  its  appropriate  form,  and 
then  it  is  possible  to  interpolate  values  of  y  for  any  value  of  #, 
whatever  the  grading  of  the  observations  may  be. 

It  is  not  proposed  here  to  discuss  how  far  these  equations 
can  be  used  in  questions  of  probability,  nor  to  consider  how  far 
the  fundamental  formula  is  empirical  and  how  far  it  is  depen¬ 
dent  on  hypotheses  of  chance  generation. 


Professor  Edgeworth’ s  Method. 

Professor  Edgeworth  has  developed  a  formula  based  on  a 
transformation  of  the  normal  curve  of  error  which  represents 
classes  of  cases  whose  skewness  is  too  great  to  allow  them  to  be 
included  under  the  second  approximation  of  the  generalised 
law  of  error.  It  has  not  yet  been  tried  sufficiently  to  decide 
how  far  it  is  useful  for  description,  interpolation,  or  other 
purposes.  (See  Statistical  Journal,  two  series  of  papers,  com¬ 
mencing  December,  1898,  and  July,  1916,  respectively.) 


Professor  Pareto's  Equation. 

The  equation  D^y  =  —  — - ,  obtainable  from  the  system 

described  above  by  taking  the  case  where  b0  —  o  and 
h  1 

b2  =  —  = - ,  represents  a  curve  which  slopes  downwards  to 

CL  TVl 

the  right  for  all  possible  values  of  x,  when  m  is  positive. 

In  its  integral  form  it  is  logy  =  —  mlog x  +  const., 
or  y  =  Cx~m . 

The  area  of  the  curve  from  x  to  00  is 


z  = 


00 


C x~mdx 


X 


Cx1- 


m 


i  —  m. 


00 


C 


(m  —  1 

Write  a  for  m  —  1  and  A  for 

A  a 


Y-— T,  if  m  >  1. 
)xm  1 

C 


y  = 


m  —  1 
A 


,  and  we  have 


x 


a  -f  1 


z  = 


(87) 


The  last  equation  is  the  simplest  form  of  “  Pareto's  Law  " 
for  incomes.  Here  A  and  a  are  constants  and  z  is  the  aggregate 
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number  of  persons  whose  incomes  are  at  or  about  £x  (or  x 
francs,  etc.). 

Other  groups,  e.g.  the  number  of  houses  of  various  annual 
values,  where  the  number  of  instances  and  the  variable  are 
capable  of  very  wide  ranges  of  values,  and  which  are  suitable 
for  graphing  on  double  logarithmic  scales,  are  also  found  to 
conform  to  the  same  formula. 

In  the  case  of  incomes,  the  aggregate  of  incomes  from  £xx 
to  £x2  is 

_I _ L_\ 

yd”  1  y  (X  “  1  / 

Aj  A  2  / 


r*2  A  a 

xydx  — 

*1 


a 


The  number  of  incomes  in  the  range  is  A  i  ~ 

\.x  t 


y  a 

A  0  / 


The  law  is  not  generally  found  applicable  to  very  low  or 
very  high  incomes.  If  it  did  extend  to  the  maximum  income, 
we  should  have 

A  a 

Aggregate  income  at  or  above  £x  = 


(a  —  l)xa 
A 


- 1» 


Number  of  incomes  at  or  above  £x  =  — ,  =  N,  say, 

x 

and  hence 


Average  income  from  £x  upwards  =  - — -  .  x, 

and  these  equations  would  give  A  and  a  immediately  from 
records  of  incomes. 

Pareto’s  equation  fits  the  statistics  of  incomes  of  1911-12 
paying  super-tax  very  well  over  the  range  £5, 000-^5  5, 000  ; 
above  the  latter  income  it  gives  numbers  in  excess  of  the 
recorded  income. 


a  —  1*5,  log  A  =  9-618  are  found  to  give  a  close  fit. 


Range  of  Incomes 
(ooo’s). 


Number  of  Incomes. 
Calculated.  Recorded. 


£5  to 

£i° 

7.546 

7.4II 

10  ,, 

15 

1,890 

2,029 

15  .. 

20 

790 

787 

20  ,, 

25 

424 

438 

25  .. 

35 

411 

382 

35 

45 

199 

186 

45  .. 

55 

103 

107 

55 

65 

70 

56 

65  „ 

75 

50 

37 

75  » 

100 

118 

55 

100  and 

over  . 

83 

66 

Totals  . 

Aggregate  of  incomes  :  Calculated, 


11,700 

^166,000,000  ; 


H.554 

recorded,  £  145, 000,000. 
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If  a  doubly-logarithmic  diagram  is  drawn,  the  range  over 
which  a  straight  line  is  a  good  approximation  can  be  seen,  and 
a  trial  value  of  a  is  suggested  by  its  gradient.  This  value  may 
be  tested  by  choosing  two  values  for  a;,  say  xx  and  x2>  which 
give  values  of  N  represented  by  points  lying  nearly  on  the 
empirical  line,  say  Nj_  and  N2. 


Then 


log  N,  -  log  Na 
log  *2  -  log  ‘ 


If  we  take  xx  =  5,000,  x2  =  45,000  from  the  table  just 
given,  we  have  Nx  =  11,554  and  N2  =  321  ;  hence  a  =  1*63. 

This  method,  however,  assumes  that  the  number  up  to 
the  maximum  income  conforms  to  the  law,  which  is  not  generally 
the  case  ;  and  in  practice  it  is  better  to  take  three  values 

%1>  %%>  %3’ 

Then  the  equation 


number  of  incomes  from  x1  to  x2 
number  of  incomes  from  xx  to  x3 


=k,  a  known  quantity, 


is  sufficient  to  give  a.  Suppose  a  —  i-6  is  the  trial  value  ; 
calculate /(a)  —  k  for  a  =  1*5,  a  —  1*55,  a  —  i*6o,  a  =  1*65  in 
succession,  and  obtain  by  interpolation  a  value  of  a  which 
makes  f(a)  =  k  as  nearly  as  possible.  Then  test  the  resulting 
value  against  other  parts  of  the  record.  Given  a,  A  is  easily 
found. 

Another  method,  which  perhaps  uses  the  data  more  com¬ 
pletely,  is  to  use  the  equation, 

Average  income  in  the  range  x±  to  x2 


For  various  workings  on  the  formula  see  House  of  Commons 
Committee  on  the  Income  Tax  (H.  of  C.  No.  365  of  1906, 
pp.  220-30,  240-1,  245-6). 


Makehatri s  Formula. 


--  .-T-  —  a  4 -bcx  leads  to  a  formula  im- 
y  ax 

portant  in  actuarial  work. 


The  equation 
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For  convenience  in  integration  write 

a  =  —  log  s,  b  =  —  log  c  x  log  g. 

Then  logy  =  #log  s  +  cx  .  logg  +  const. 

y  =  ksx  .  ( g)c\  where  k,  s,  g,  c  are  constants . (88) 

This  is  Makeham’s  formula  where  y  is  the  number  of  a 
given  generation  who  survive  to  the  age  and  is  written  lx. 

The  ratio  of  the  number  of  persons  dying  in  an  interval, 
8x,  to  the  number  alive  at  the  beginning  of  the  interval,  divided 
by  the  duration  of  the  interval,  is 


l,  ^x+Sx _ 

lx  .  8x 


i  cilx 
lx  dx 


(89) 


when  the  interval  is  indefinitely  diminished.  This  exp.pssion 
is  written  /jlx  and  is  called  “  the  force  of  mortality/' 

The  differential  equation  of  the  formula  then  gives 

ixx  =  a-\~bcx . (90) 

and  the  assumption  is  that  the  force  of  mortality  is  the  sum  of 
two  quantities  one  of  which  is  a  constant  a,  and  the  other 
bcx  =  fjb'x,  say,  is  such  that  it  increases  in  a  constant  geometrical 

'D  ' 

progression,  for  — =  log  c. 

f1  %  _ _ 

A  more  complicated  form,  obtainable  by  writing  a  +  a'x  for 

a  above,  is  used  by  Mr.  Hardy  (loc.  cit.  p.  88)  and  in  the 
Report  for  1912-13  on  the  Administration  of  the  National 
Insurance  Act,  Part  I.,  p.  585  (Cd.  6907). 

He  also  uses  a  hyperbolic  equation  for  graduation, 


log 


y 


N -y 


k  + 


m 


+  t— 


n 


a  +  x  b  +  x ' 


where  y  is  the  number  of  husbands  below  age  x  and  N  is  the 
total  number  of  husbands  [Construction  of  Mortality  Tables, 
pp.  50-1  and  Cd.  6907,  p.  595). 


CHAPTER  VI. 


THEORY  OF  CORRELATION. 

Introductory. 

One  of  the  principal  classes  of  problems  in  statistics  is  to 
determine  whether  phenomena  are  independent  of  each  other, 
and  if  not,  to  measure  their  dependence. 

In  this  chapter  we  consider  principally  the  problem  as  it 
arises  in  connection  with  two  or  three  variable  quantities,  the 
causes  of  whose  variation  may  have  something  in  common. 

Suppose  that  we  have  pairs  of  observations,  e.g.  the  height 
and  span  of  a  man,  the  heights  of  pairs  of  brothers,  the  income 
and  rent  of  a  household.  Let  the  pairs  of  measurements  be 
(Xj,  YJ  (X2,  Y2),  etc.,  and  let  there  be  a  frequency  group  of 
the  X’s  and  another  of  the  Y’s,  with  averages  x,  y.  Then  if 
X  and  Y  are  completely  independent,  when  we  are  told  a 
value  of  X,  we  shall  have  no  knowledge  about  the  magnitude 
of  the  corresponding  value  of  Y ;  the  chance  that  it  shall  have 
any  particular  deviation  from  y  is  simply  that  given  by  its 
own  frequency  curve  ;  but  if  there  is  anything  common  to 
X  and  Y  in  the  causes  of  their  variations,  the  statement  of 
the  value  of  an  X  will  presumably  affect  the  probability  of  the 
deviations  of  the  corresponding  Y. 

X  and  Y  may  of  course  be  connected  rigidly  by  an  equation, 
as,  for  example,  X  lbs.  and  Y  kilos,  may  be  different  ways  of 
expressing  the  weight  of  the  same  body,  so  that  X  =  2-2oqY, 
and  Y/X  is  constant.  In  the  cases  with  which  we  have  to 
deal,  however,  the  connection  is  not  one  of  direct  relation  ; 
when  X  is  given,  Y  is  not  determinate,  but  in  a  series  of 
measurements  (e.g.  of  height)  we  shall  find  for  the  same  X 
varying  values  of  Y. 

If  the  average  or  shape  of  the  frequency  curve  of  the  Y's 
associated  with  a  given  X  is  not  the  same  as  that  for  all  values 
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of  Y  when  the  sorting  by  values  of  X  is  not  made,  then  there 
is  something  common  to  the  two  quantities,  and  they  are  said 
to  be  correlated. 

An  obvious  first  method  of  analysis  is  to  arrange  the 
observed  values  of  Y  in  “  arrays,”  each  array  containing  those 
values  for  which  the  X  is  the  same,  as  in  an  ordinary  cross 
table.  The  average  Y^  of  an  array  of  Y's  when  X  =  X* 
would,  if  there  were  complete  independence  and  the  number 
of  observations  were  great,  tend  to  equal  y,  the  average  of 
all  the  Y’s  ;  if  it  differs  from  y  by  more  than  would  be  expected 
in  random  sampling,  then  there  is  an  indication  that  the  value 
of  Y  is  not  independent  of  that  of  X. 


Y 

N 

X 

0 

0 

R 

0  x  > 

0 

X 

X 

0 

X 

;Q 

X* 

0° 

x  0 

0 

X 

0 

x  * 

0 

M 

X 

Y1 

In  the  figure  let  O  represent  the  averages  of  the  X’s  and  of 
the  Y's.  Let  xt>  yt,  be  the  excess  of  Xty  Yt  over  their  averages, 
r,  y,  and  let  OM  be  a  selected  xt  and  MQ  be  the  average  of  the 
ysin  that  array,  so  that  MQ  =  Y t  —y;  and  let  the  marks 
xx..  indicate  various  positions  of  Q. 

Then  if  Y  is  independent  of  X,  x  x  x  will  lie  away  from  XX' 
only  if  the  observations  are  not  sufficiently  numerous  to  give 
the  true  averages.  If  Y  is  not  independent  of  X,  Q  will  tend 
to  have  a  definite  locus,  which  a  free-hand  line  drawn  through 
its  various  positions  will  approximately  define.  If  this  is 
the  case  we  can  write  Y*  —  y  =  f  (Xt  —  x),  so  that  when  X* 
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is  given,  though  the  actual  value  of  Y*  is  not  known,  yet  Yt, 
the  average  of  the  array  found  in  repeated  selections,  is  approxi¬ 
mately  determinate. 

Similarly,  if  we  take  an  array  of  X’s  corresponding  to  a 
selected  value  of  yt  (ON),  the  averages  of  these  arrays,  such  as 
R,  marked  o  o  o,  tend  to  lie  on  a  curve  Xt  —  x  =  fx  (Yt  —  y), 
where  is  not  the  same  as  /. 

The  locus  of  Q  is  called  the  curve  of  "  regression  ”  of 
Y  on  X,  and  that  of  R  the  curve  of  regression  of  X  on  Y. 

It  frequently  happens  that  these  curves  are  approximately 
rectilinear,  especially  in  the  neighbourhood  of  O,  so  that 
Y t  —  y  —  kxt  approximately,  where  k  is  a  constant  when  xt 
is  small. 


The  gradient  of  this  line,  k,  equals 


MQ 

OM 


approximately,  for 


any  small  value  of  OM,  and  may  presumably  be  found  by  some 
method  of  averaging  the  various  values  of  gS.  We  return 
to  this  on  p.  355  and  p.  364. 

We  can  approach  the  problem  from  a  different  aspect  as 
follows. 

Let  there  be  n  magnitudes  Xx,  X2  .  .  . ,  and  n  magnitudes 

Y  Y 
1  v  x  2  •  *  • 

Select  at  random  an  X,  and  independently  select  a  Y,  and 
form  the  product  XY.  Then  in  the  long  run,  when  a  particular 
X  happens  to  be  selected,  the  various  values  of  Y  will  come 
with  equal  frequency,  and  in  the  long  run  each  of  the  n 2 
products  XjYx,  XxY2  .  .  .  X2Yx  .  .  .  XnYn  will  occur  with 
the  same  frequency. 

The  sum  of  a  very  great  number,  N,  of  the  products 


=  S(XY)  =  S(T  +  x)(y  +  y)  =  N xy  +  x .  Sy  +ySx  +  S xy. 


N 

Here  Sy  tends  to  be  —(y'i  +  y2  +  . 


tends  to  0. 


N 


S(yy)  tends  to  be  -h  (x^  JrX1y2  + . . 


n* 


•  +  yn)  —  0,  and  S^;  also 

+  x2y  1  +  . . .  +  xnyn) 

N  c  c 

=  —5- .  Sa;  .  Sv  =  0. 
n2 


Hence  S(XY)  tends  to  N xy,  and  the  mean  of  the  product 
XY  tends  to  equal  the  product  of  the  means  of  X  and  of  Y. 
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But  if  the  selection  of  Y  is  not  independent  of  that  of  X, 
the  n2  products  x1y1  etc.  do  not  come  with  equal  frequency, 
and  mean  XY  =  xy  -f  Mean  xy. 


Again  if  m  is  the  unweighted  average  of  Mx  M2  .  .  .  ,  where 
in  the  notation  of  pp.  319,  320  M*  =  m  -f-  nit,  and  iuw  is  the 
weighted  average,  where  wt  the  weight  for  M*,  =  w  +  wt>  w 
being  the  average  of  the  weights. 


—  S  (w  +  wt)  (m  +  mt)  1  — 

mw  —  --■■■■ — — —  -  _  (nwm  -j-  bwt 
nw  nw 


wt  m\ 
w  ’  m  ) 


(9i) 


mw  —  ni  only  if  S wtmt  =  0,  and  this  will  only  be  the  case  if 
n  is  large,  and  if  on  the  whole  a  large  weight  is  not  more  often 
found  with  a  large  than  with  a  small  quantity,  or  vice  versa. 


On  p.  288  it  was  shown  that  with  the  notation  above,  the 
mean  of  (x  +  y)2  was  ax2  +  a-y2,  where  crx  and  ay  are  the 
standard  deviations  of  X  and  Y,  if  the  selections  of  X  and 
Y  are  quite  independent. 

It  is  easy  to  see  that  when  there  is  dependence  the  analysis 
is  modified  and  leads  to 

s2  =  Mean  (x  +  y)2  =  ax 2  +  ay2  +  2  .  Mean  xy  .  .  (92) 


The  Coefficient  of  Correlation. 

Hence  it  appears  that  the  quantity  Mean  xy  enters  into 
many  expressions  when  X  and  Y  are  not  independent  and 
that  in  itself  it  gives  an  indication  of  the  existence  and  amount 
of  correlation.  However,  its  magnitude  depends  on  the 
units  used  in  measuring  a:  and  y  so  that  there  is  no  natural 
scale  for  it,  and  consequently  a  quantity  defined  as  follows  is 
used  in  preference  to  it. 

If  XjYj,  X2Y2,  . .  .  XtY t .  .  .  XnYn  are  pairs  of  measurements, 
and  the  averages  and  standard  deviations  of  the  X's  and  Y's  are 

A  A* 
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x,  ( TX,  y,  cry,  then  the  coefficient  of  correlation  between  X,  Y  is  written 
rxy>  where 

r  _S{(Xt  — £)(Yt— i  s/xt  yt\ 

XV  ncrx(Ty  n  \(TX  (Ty ) 

(taking  Xt  =  x  +  %t,  Y t  =  v  +  yi) 

=  {S  (X,Y()  -  *SY,  -  vSX,  +  nicy) 

n<TX<T  y 

=  — - —  {S  (X«Yf)  —  nxv], . (93) 

na-xO-y 

since  SYt  =.  ny,  S X«  =  nx. 

In  the  examples  just  given 

Mean  XY  =  xy  +  rxy<jx^y 

mw  —  m  1  +  rmw  —  •  — - ) 

\  m  w  J 

.  s2  =  c rx2  +  Oy2  +  2 rxyO-x<Ty. 

Write  r  for  rxy.  It  can  readily  be  shown  that  r  is  never  >  1 
or  <  —  1 . 

For  n2r2o-x2a-y 2  =  (S  x^)2 ; 

but  n2<rx2o-y2—(Sxtyt)2 

=  (V+V+  •  •  •)  (3/i2+4;22+  •  •  0“(*i3'i+*ay2+  •  •  d2> 
since  ncrx2  ==  x^2  d-  x2  —}-  •  •  •  >  and  nay  y ^  4- y%  4~  •  •  •» 

—  (%3;2  ^2^1) 2 

+  (Xiy3— X3yi)2+  .  .  .  +(x*y*—- x*y*)2Jr  •  •  •  +(#?0'n-i  —  ^w-i^n)2 
which  is  >  0,  unless  %y2  —  =  o  =  tfpyg  —  x3yx  =  .  .  and 

Xj  X2  Xg  Xn  cr x 

yi  y%  JV3  y>i  <ry 

in  which  case  the  expression  =  o,  and  r  —  ±  1. 

r2  =  1>  and  i>r>  —  1,  unless  y  varies  directly  as 

n*(TX  CTy 

and  then  ^  +  1  or  -  1 . (94) 

Hence  r  is  a  quantity  which  depends  on  all  the  observa¬ 
tions,  is  zero  when  independence  is  complete  and  Mean  xy  =  o, 
is  independent  of  the  units  in  which  X  and  Y  are  measured, 
increases  whenever  a  positive  xt  is  found  with  a  positive  yt 
or  a  negative  xt  with  a  negative  yu  but  only  reaches  the  value 
+  1  (which  it  can  never  exceed)  when  x  and  y  are  connected 
rigidly  by  the  equation  y  =  x  X  constant.  If  positive  x’s 
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are  found  with  negative  y  s  and  vice  versa,  r  varies  from  o 
to  —  1. 

r  is  therefore  a  sensitive  measurement  of  the  amount  of 
correlation. 

To  distinguish  it  from  other  measurements  it  is  sometimes 
called  the  sum-product  coefficient  of  correlation. 

If  the  pairs  are  grouped  in  arrays,  such  as  xs  xys ;  xs  2ys,  .  .  . 
and  ys  is  the  average  of  the  nt  quantities  xys,  2ys  .  .  .  ,  then 

ns  xs 2  y 
,  nax 2  ’  a; 


y 


S xy  =  S#s  .  nsys,  and  —  .r 

&  X 


S 


where  Snsx82  —  nax 2 


1  X 


r  is  therefore  a  weighted  average  of  the  ratios 

0~x 


MQ 

OM 


on  p.  352, 


and  y  =  r 

✓ 


.  x  is  an  approximation  to  the  locus  of  Q. 

<Tx 


On  the  following  pages  we  examine  the  circumstances 
which  give  r  various  numerical  values,  study  the  distribution 
of  X,Y  on  various  hypotheses,  and  find  the  equations  of  the 
lines  of  regression. 


Nature  of  r. 

Let  X  and  Y  be  two  variable  quantities  which  depend  on  other 
variables  U,  V,  W  in  such  a  way  that 

X«  =  XU«  -f*  2U«  4-  •  •  •  4-  ~b  1 Y  -b  2 Y«  +  • .  •  +  qVt> 

Y t  =  xUt  +  2^  -b  •  •  •  4“  p^t  ~b  iW t  4"  2W t  4-  •  •  •  4-  qWt, 

where  1U^  is  selected  at  random  from  a  frequency  group  of  any 
form  whose  mean  is  xu  and  standard  deviation  x(ru>  2lh  is  selected 
independently  from  another  group,  and  so  on  throughout  the  U's, 
V's  and  W’s.  p  and  q  are  any  integers. 

Write  XU t  =  x«  4- 1%  etc.,  and  X«  =  x  4-  xt}  Yt  =y  +  yt  where 
x  and  y  are  the  means  of  all  possible  values  of  X  and  Y.  Let  crx, 
a-y  be  the  standard  deviations  of  X  and  Y. 

Then  xt  =  yut  4~  2 ut  4~  •  •  •  4~  put  4~  Nt  4~  iPt  4~  •  •  •  4~  qVt, 
yt  =  i^t  4~  2ut  4~  •  •  •  4"  pUt  +  1  wt  +  2 wt  4"  •  •  •  4~  qwt  \ 

<rx2  =  xc ru2  4"  •  •  •  4-  p^u2  4"  i°-y2  4-  •  •  •  4“  qVv2, 
and  o-y 2  —  jo-^2  4“  •  •  •  4“ p^u2  4~  i0^2  +  qo-w2, 

since,  by  hypothesis,  the  us,  v’s,  and  w’s  are  all  independent  of 
each  other  (p.  288). 

If  1crw=2crw= .  .  .  —  cru>  1<iv=  2<tv—  •  .  .  —  crv,  and  x<sw—<yrw=  -  .  .  =  ou>, 
or  if  (TU 2,  av2,  trio2  are  mean  values  of  the  U,  V,  and  W  standard 
deviations  squared,  then 

ax2  =  pcru 2  4"  qa-v2,  <ry2  =  p&\2  4~  qcrw2. 
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Also,  mean  Xtyt  —  mean  4*  mean  2^2  +  .  . . 

-f  mean  xUt .  xwt  -f  •  •  •  +  mean  xvt .  j ««  +  ... 
=  pvu, 


for,  since,  by  hypothesis,  the  selections  of  the  various  U's,  V's  and 
W’s  are  independent  of  each  other,  such  a  term  as  mean  xUt .  xwt  is 
zero  in  the  long  run. 

Hence 


r  =  mean  Xtyt/vx  .  cry  = 


_ P^U2 _ 

Vi (p(Tu2  +  qo-v2)  ( p<Tu 2  +  qo-w2)}  ' 


and,  in  particular,  if  cru  ==  <tv  =  <rw, 

P 

r  =  —  — 

P  +  q 


(96) 


This  is  the  simplest  conception  of  the  numerical  value  of 
r  ;  expressed  in  words  it  shows  that  the  correlation  coefficient 
tends  to  be  the  ratio  of  the  number  of  causes  common  in  the 
genesis  of  two  variables  to  the  whole  number  of  independent 
causes  on  which  each  depends. 


If  constants  a,  b,  c,  d  are  introduced  so  that 

Xt  —  .  yut  -f-  .  .  .  +  .  pUt  +  bx  .  yut  -f-  .  .  .  +  bq  .  qvA  ,  . 

yt  =  C1.1Ut+...  +  Cp.  pUt  +  dx  .  xWt  +  •  .  •  +  dq  •  q®t  J  * 

then  c rx2  =  S  .  u'2aru 2  -f*  S  .  b2a-02,  (Ty 2  =  S  .  c2c x2  -f*  S  .  d2crw2, 

mean  xtyt  =  S aca-u2, 

and  the  expression  for  r  can  be  readily  written  down. 


The  Correlation  Surface. 


Consider  the  case  where  the  frequency  curves  of  the  U's, 
V's  and  W's  are  normal,  and  in  the  first  place  let  us  examine 
the  grouping  of  X,Y  in  the  simple  case  where 

xt  =  ut  +  vt}  yt  =  ut  +  wt. 

The  chance  of  the  concurrence  of  selections  ut,  vt,  wt  is 

w  j.  w,2 

x 


X 


( TuVzir  (Tv's/Z'IT 

Eliminate  vt  and  wt. 


_  V 
e  2<tv2 


awV  27 r 


e  2<V 
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The  chance  of  obtaining  xt  (to  xt  -f-  Sx),  yt  (to  yt  +  &y),  when 
ut  (to  ut  +  8u)  has  been  selected  from  the  U  group,  is 

h  - ,  (ut  -  y^2 \ 

I  .  Bx  By  Bu 


<r„crTJ<Tw(27r)f 


-i{- 
!  1% 


2  _|_ 


<Tu(rv(7w(2Tr)* 
where  k  = 


-  ik{uf  -  lxt  -  myt}2  -  h^xx^  +2 hxtyt  +  byt 2) 

.  e  .  Sx  By  Bu 


3 — o  H - 9,  =  — 9,  km  —  — 5 


(j-  2  /t  2  1  2 

w  it  u  v  u  w 


w 


a  =  ~6  —  kl2,  b  =  — ,  —  km2,  h  —  —  klm. 


to 


The  chance  (say  P^)  of  obtaining  #t,  y*  from  value 

of  w  is  obtained  by  adding  the  chances  of  obtaining  them 
from  an  assigned  value  of  u. 

Hence,  writing  x,  y  for  xt,  yt, 


-  ax 2  +  2hxy  +  by2)  rn 


-  ±k{ut  -  lxt  -  myt  f 


xy 


(Tll<T V<T tv  .  27 r 
I 


J  -  W  2 


7T 


^  ( ax 2  +  2  lixy  +  by'2) 


cru(Tv<Tw2tr'\/k 

Now  CT®2  =  (Tu2  +  CT  v2 ,  <Ty2  =  (Tti2  +  CTiv2,  Y(Tx(Ty  =  <Ttt2 
^  _  (0’“2  H"  O'®2)  (ou2  +  (TU)2)  —  Vu*  __  (Tx2(Ty2  (i  —  r2) 

•  •  »V  -  a  n  «  -  ft  «  n 


O’tt2(r®2O't02 


OOO 

(Tu  (Tv  (TW 


a  =  kl-kl2  =  l(k-kl)=l(xL+  1 


(7 


y 


CTu2  CTtV2/ 


k(Tv2  (Tu2(TlV2 


0-/(1  -r2) 

and  similarly 

.  1 

<r,2(l-r2) 

7  1  7  /  t  1  _  °^2  __  ^ 

T  tu .  yew-  -  ^2  (I  _  y2)  -  ^  (I  _  ,a) 


1 


Pa;?/  - 


27TO"  a;CTy 


V  i  —  / 


2  (1  -  r2)  \<ry 


x2  ^  y2  2rxy's 


’y 


(T  X<T  y 


) 


.  .  (98) 


By  an  extension  of  this  method  it  is  shown  (Elderton, 
Frequency  Curves ,  pp.  109  seq.,  following  Pearson,  Trans, 
of  Royal  Society ,  vol.  187  (1896),  A,  175)  that  if  xtf  yt  are  formed 
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by  the  weighted  sum  of  a  number  of  variables,  all  of  normal 
frequency,  as  expressed  in  the  equations  (97)  above,  P*,,  is  of 
the  form  Ke ~ (ax2  +  2hxv  + 6j/2)  as  just  found,  and  when  the  con¬ 
ditions  that  this  surface  shall  have  unit  volume,  standard 
deviations  <rx,  ay  and  mean  product  raxa-y  are  expressed  by 
integration,  the  values  of  k,  a ,  h,  b  are  the  same  as  in  the  simple 
case  discussed. 

Though  this  method  is  of  considerable  interest,  and  by  it 
the  measurement  of  correlation  by  the  product-sum  formula 
was  introduced  into  modern  statistics  by  Professor  Pearson, 
its  importance  is  greatly  diminished  by  the  assumption  that 
the  elemental  frequency  curves  are  normal.  The  following 
analysis  is  free  from  this  assumption  ;  it  is  derived  from 
Professor  Edgeworth’s  paper  on  The  Law  of  Great  Numbers, 
to  which  reference  has  already  been  made. 


Edgeworth’ s  Method. 

Let 

%t  —\ut  Jt2Ut  ~f~  •  •  •  EpUf  •  •  •  yt  —\Vr\-2Vt,  +  •  •  •  4 'pVt  •  •  •  T  vflu 
where  xut  is  the  deviation  from  its  mean  of  a  quantity  selected 
from  a  curve  of  frequency  whose  standard  deviation  is  x<rUt 
and  2 ut  .  .  .  nut,  xvt  .  .  .  nvt  have  similar  meanings.  Let 
the  selection  of  the  various  ut* s  be  quite  independent  of  each 
other,  so  that  mean  xut  2ut  etc.  tend  to  zero,  and  let  the  vt’s 
be  similarly  independent  ;  but  let  some,  at  any  rate,  of  the 
z/s  be  not  independent  of  the  u’ s,  so  that  mean  xut  •  xvt, 
mean  2ut  •  2vt>  .  .  .  mean  nut  •  nvt  do  not  all  tend  to  zero.  Such 
a  quantity  as  mean  xut  •  2vt  is,  however,  to  be  taken  as  tending 
to  zero. 

Let  n  be  large  and  negligible,  and  the  other  conditions 

V  n 

described  above  (p.  299)  be  satisfied,  so  that  the  curves  of 
frequency  of  x  and  y  taken  separately  are  normal  curves  of  error. 

Then  ax2  =  S  {p(Tu2),  =  S^2), 

and  mean  xy  =  S(mean  put .  pvt),  ....  (99) 

where  p  is  any  integer  from  1  to  n. 

Now  rotate  the  axis  on  which  x  and  y  are  measured 

through  an  angle  0  determined  by  tan  26  =  -Z^(T-x(Jv  where 

®  y 
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mean  xy  .  „  .  ,  ,  . 

Y  — -  is  the  coefficient  of  correlation  between  x  and  y. 

Vx<Ty  J 

Thus  we  write 

X*  =  xt  cos  0  -f  yt  sin  6,  Yt  =  xt  sin  0  —  yt  cos  6. 
Similarly  write 

=  put  cos  0  +  pvt  sin  0,  pYt  =  put .  sin  0  —  pvt  cos  6* 

Then  Xt  =  S  and  Yf  =  SpVt. 

Mean  XfYt  =  sin  0  cos  0  (o^2  —  cry 2)  —  cos  26* .  (mean  #y)  =  o, 
from  the  value  assigned  to  tan  2 0. 

trx2  =  cos26  .a- x2-\-  sin20  .<ry24-sin  2 0  .ro-xcry  =  S  (mean^Ut2) 

<rY2  =  sin2  o-a;2 + cos2  ^.ctj,2— sin  2^.  ro-^o-y  =  S  (meanpVf2) 

•'  •  CTx2  +  <rY2  =  2  +  <Tp2  - 

crx2  —  crY2  =  (<r*2  —  (Ty2)  COS  20^-2^  sin  2 0  =  (crx2  —  (Tv2)  Sec  20 

=  V{(^2  —  °>2)2  T 

4<rx2o-Y2  =  4°’*2(T t/2  l1  “  ^2) 

mean  .pYt  —  sin  0  cos  0  (pent2  —  po-y2)  —  cos  2  6  (mean^  .pvt). 

S(mean  PU t  -pYt)  =  mean  XfY,  =  0 . (100) 

Now  follow  the  method  of  pp.  296-7  above  for  calculating 
the  moments  of  X  and  Y. 

Let  a,  f3  be  any  small  constants,  whose  use  is  to  collect  similar 
terms. 

^aXt  f  &Yt  __  ^ a  jUf  +  /3  jV*  2U t  +  0  2 W 

Expand  the  exponentials,  give  t  all  possible  values  and 
take  their  mean,  remembering  that  the  mean  of  a  sum  is  the 
sum  of  the  means  of  its  terms,  and  that  the  mean  of  a  product 
of  independent  factors  (as  are  the  factors  on  the  right-hand 
side)  is  the  product  of  the  means  of  the  factors,  and  that  the 
mean  of  first  powers  is  zero. 

-!  I  +  —  <rx2  +  •  •  •  +  £"j  (mean  X*)  .  .  .  j 

X  |i  +  <rY2  +  .  .  .  +  j-\  (mean  Yl) . . .  j 
where  k,  l  are  any  integers, 


*  These  meanings  of  X,  Y,  U,  V  are  of  course  not  connected  with  the  use 
of  the  same  letters  on  p.  355  above. 
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Ldc  =s  Product  of  n  factors 

{i  +  |a2  (mean  PU«2)  +  J/32  (mean  pVt2)  -f  a/?  (mean  p\JtpVt)  + 
log  (i  +  . . .  +  ^ ~  (mean  X*Y*)  +  . . . 


where  k  ox  l  may  be  zero, 

=  S  [logji  +  Ja2  (mean^lh2)  +  J/82  (meanly*2)  -fa/?  (mea%U^V*) +. .  .}J 
=  (expanding  by  the  logarithmic  series  and  adding  terms) 

Jtt2  S  (mean  p\Jt2)  -f  S  (mean  ^V*2)  -f  a/3  S  (mean  pUt  pVt)  + . .  . 

=  |a2 .  o-x2  +  |/?2 .  crY2  +  aft  x  o*  +  terms  involving  a3,  a2/?,  etc. 

By  arguments  similar  to  those  on  p.  29 7  it  is  found  that  the 

terms  in  a3  are  of  order  —j=  in  comparison  with  terms  in  a2, 

Vn 


and  hence,  when  — -=  is  neglected, 

Vn 

1  +  .  .  .  +  (mean  X*Y<)  +  .  .  .  = 


4o8<r  8  £flV8 
e  X  e 


=  (l  +  iaVx2+  . .  •  +^(4<A-x2)*:. . .)  (i  +  i/3W+  •  •  •  +^(J|8*<7Y*)‘ 


Equate  coefficients  of  terms  on  the  two  sides  of  this 
equation. 

We  have  (when  1= o)  mean  X2*+1=o(  mean  X.2k=^)  cr  2k 

2kk !  x 


as  in  the  normal  curve,  and  similarly  for  Y  when  k  —  o. 
All  means  involving  odd  powers  of  X  or  Y  are  zero. 


Mean  (X“Y2‘)  =  ^  <rx“ .  ^  <ry« 


.  (101) 


These  are  precisely  the  mean  powers  found  by  integrating 
the  surface 

1A2  y2 

1  e  ~  V* 2  _ 1  __  "  V~2 

ax  V 27T  aYV2.7r 

where  X  is  independent  of  Y.  Hence,  as  on  pp.  297-8,  we 
may  take  this  equation  as  giving  the  frequency  of  X,  Y. 

It  remains  to  transfer  back  to  the  original  axes. 

As  already  shown  o-xtrY  ==  o-xo-PV{ i  —  r2) 

XVY2  +  YVX2 

=  (x  COS  e  +  y  sin  0)  2o-Y2  +  (x  sin  6  —  y  cos  0)  2<rx2 

-  A;2(cos2^Y2+sin2^o-x2) +y2(sin2^Y2+cos2^x2)-^ysin2^(o-x2-o-Y2) 


27 r«rxcrY 


*  From  equation  (ioo). 
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Wx  , 


Then  sum  of  the  coefficients  of  x2  and  y2 

=  Ox2  4-  0Y2  =  Cx2+<ry2, 
and  their  difference  =  cos  2#(crY2— <rx2)  =  erf- 

hence  the  coefficients  are  cry 2  and  crx2. 

Also  sin  2 0  (crx2  —  orf2)  ==  tan  26  ( crx 2  —  cry2)  —  2Y<TxO‘y. 

I 

2  _  <rx2(Ty2(l  —Y2) 


X2  Y2 
Hence  ^  +  -9 


O'  x  O-y 

and  the  equation  of  the  surface  is 


{*2o>2  +  y2crx2  —  2Y<Txcry  . 


1 


r 


r 


\  — 5.  H - «. 


2  rxy  \ 


Z  — 


27 Tcrxcry\/  (i  —  Y2) 


.  e 


2  (1  -  r2)  l  (rx2  a  2  crx<Ty  J 


(102) 


as  already  found  (formula  (98))  on  the  simpler  hypothesis  that  the 
elemental  groups  have  normal  frequency. 

In  this  equation  ax2=Sau 2,  o-tJ2=So-v2,  Yaxay= S  .  (yp  .  pau  .  pav), 
from  equations  (99),  where  rp  is  the  correlation  coefficient 
between  pu  and  pv. 

Constants  can  be  introduced  in  the  original  equations  so 
that  x  =  xaxu  +  2a2u  +  >  •  •  and  y  =  xbxv  +  2b2v  +  .  .  . 
without  affecting  the  method  of  analysis. 


pYOpeYties  of  the  NoyyyioI  CoYYelation  SuYface. 

The  centre  is  at  the  average  of  the  #  and  of  the  y  variables. 


Volume  = 


00  r  cr>  — 

e 

-  00 


1 ...  Jil  _ry.y  _ 

2(1  -r2)\<rx  ffy)  " <ry% 


27 TV^y^ I  —  Y2'  - 

Second  moment  in  y  —  J  j  zyHxdy 


.e  .  dxdy  =  1 


(103) 


•  ao  r  co  _ 


1  X 


n 


2TT(Tx(Ty/\/ 1  -  Y“J  -  cc*7  -  00 


where  xf  =  x  —  r  —  .y  ;  then  integrating  in  respect  of  x'  we  find 

CTy 


e  2(1 <\dx'.dy, 


that  the  expression 

1  r 

= - — =  /  y2<?  av~  dy  —  cry2  by  formula  (21) . (104) 

(TyV  27T  ^  -  CIO 

and  similarly  the  second  moment  in  x  is  crx2. 
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Mean  product  of  xy  = 
integration, 


,  ±00  being  the  limits  of 


1  X 


S 2 


27 T(Tx<Ty^/ 1  Y2' 


—  I  r  —y2e  "  dy  =  Yaxa-y 


x 


A 


'  r  —  y)e  ^  .ye  **  dx'dy 


-  h 


y 1 


CTy 


(T 


y 


V  2? r-  <Ty 


(105) 


j  /  zyx9dxdy  =  3r(rxs(Ty,  f  j zx2y2dxdy  =  (2r2  +  i)<rx2(ry2  .  .  .  (106) 


The  section  by  every  plane  parallel  to  XOZ,  YOZ  is  a  normal 
curve.  E.g.,  if  x  =  xv 

1  (  <rv  Y  ,  «i 

I 


Z  = 


27 T(Tx(Ty\/ 1  r2 


.  e 


2(1  -  r*)<r* 


y  -  r-f-x  1 


.  e  <Tx  .  .  .  (107) 


<r 


which  is  a  normal  curve  with  its  centre  at  y  =  r—  xlt  standard 

O'x 

deviation  o-?/V  1  —  r2,  and  maximum  ordinate 


2TTcrx(Ty'\/ 1  —  r2 

The  frequency  group  of  the  y’s  corresponding  to  an  assigned 
value  of  x  is  therefore  normal,  and  its  standard  deviation  is  in¬ 
dependent  of  x.  The  average  of  the  group  (and  its  mode  and 

median)  is  for  all  values  of  x  on  the  line  y  =  r~x  —  x  tan  cf>1> 

o*  x 

(say),  and  this  is  the  line  of  regression  (p.  352). 

y  —  is  the  coefficient  of  regression  of  y  in  relation  to  x. 

X 

Similarly  for  a  given  value  of  yx  the  frequency  group  in 
xx  is  normal,  its  standard  deviation  is  axVi  —  r2,  and  its 


average  is  on  the  line  x  =  r  -—y,  say  y  =  x  tan  </>2. 

O’  y 

y  —  is  the  coefficient  of  regression  of  %  in  relation  to  y. 

<Jy 

The  geometric  mean  of  the  two  coefficients  of  regression 
is  Y. 
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Horizontal  sections  are  similar  ellipses.  Thus  if  z  =  zlt 

X  2  V2  X'V  / _ 

— 2  +  —  2 1  —  —  2  (1  —  r2)  log  (Z^TTO-xO-yV i  —r2)  .  .  (108) 

Uy  0"x&y 

The  major  axis  of  any  such  ellipse  (where  we  take  o-x>  o-y) 
makes  the  angle  6  with  the  axis  of  x,  where 


tan  2 6  — 


2  r<Tx<Ty 

a  2  _  „  2‘ 
°  x 


Now 


tan  2fa  = 


2  YGx(Ty  _ 

LU"  ~  o-.r2  —  r2<ry2’  axi"  LCtA1  ~  r2ax2  —  cj2  ‘ 
If  r  =  ±  1,  0  =  (bJ  =  (f)2,  and  the  surface  degrades  into  the 


,  and  tan  2 <^>2  = 


2raxa1 
r2<rJ 


V  X 

plane  —  — .  Otherwise,  when  \r\<i,  <j>2>  0  >  fa,  and  the 

(Ty  cr  x 

lines  of  regression  lie  on  either  side  of  the  plane  of  the  major 
axis  of  the  ellipses.  If  ax=<jy,  0=  and  the  lines  of  regression 

are  equally  inclined  to  the  planes  containing  the  principal  axes 
of  the  ellipses. 

It  should  be  noticed  that  the  surface  is  completely  deter¬ 
mined  by  five  quantities,  viz.,  two  averages,  tw^o  standard 
deviations,  and  one  correlation  coefficient. 


Rectilinear  Regression. 

We  have  found  that  under  certain  conditions,  of  a  simple 
nature  and  dependent  mainly  on  plurality  of  causation,  the 
line  of  regression,  that  is  the  locus  of  the  averages  of  one 
variable  (y)  for  given  values  of  the  other  (x),  is  straight  and 
passes  through  the  position  representing  the  general  averages. 

If  the  conditions  are  not  rigidly,  but  only  approximately, 
satisfied,  there  is  a  presumption  that  rectilinearity  of  regres¬ 
sion  will  be  approximately  attained. 

It  may  well  happen  that  regression  is  still  approximately 
rectilinear  even  if  the  variables  x  and  y  are  not  normally 

distributed,  and  that  the  equation  y=r  —  x  may  still  be 

er  x 

the  equation  of  regression,  though  the  surface  of  distribution 
is  no  longer  determinable  from  the  value  of  r. 

Let  there  be  ns  values  of  y  in  the  array  corresponding  to  a  value 

1/  £ 

xs  of  x,  and  let  their  average  be  vs.  Write  ms  =  -  ,  so  that  m»  is 
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the  gradient  of  the  line  of  regression  determined  from  one  group 
only. 

Then  it  is  shown  above  (p.  355)  that  tan  <f>v  =  r  — ,  is  a  weighted 
average  of  mv  m2 .  .  .  m8 .  .  the  weights  being  n8x8 2  &c. 


Let  ON  =  xt,  NP 8  be  any  value  of  y  found  with  xS)  and  NQ«  =y8 
be  the  average  of  ns  such  values.  Let  a  line  y  —  ax  +  b  meet  NP* 
at  Rff. 

Then  Mr.  Yule  shows  ( Statistical  Journal,  1897,  pp.  817-8)  that 
the  sum  of  all  values  of  (R«P*)2,  i.e.,  S[y«  —  (ax8  +  b)]2,  extended 

over  all  values  of  xs,  is  least  when  b  —  0  and  a  =  r  .  — .  This 

(Tx 

method  depends  on  “  least  squares,"  for  which  see  Appendix, 
Note  10. 

This  line  y  =  r  —  x  then  passes  through  the  observations  in 

Vx 

such  a  way  that  the  sum  of  the  squares  of  the  distances  of  the 
points  representing  the  observations,  measured  from  it  parallel  to 
the  axis  of  y,  is  a  minimum.  This  line  is,  then,  whatever  the 
distribution,  a  good  single  representation  of  regression. 

We  can  proceed  a  step  further,  if  we  assume  that  the 
dispersion  of  the  y’s  in  any  sth  array  is  independent  of  the 
value  of  x8,  and  is  always  a2.  For  if  the  averages  tend  to  lie 
on  a  straight  line,  and  only  fail  to  do  so  exactly  because  of  the 
paucity  of  observations,  then  the  deviation  from  the  average, 

_  (r«q»)2  2 

namely  RsQa,  has  a  curve  of  frequency  Ke  2a2  where  a2  = — — 

ns 

(p-  312). 
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Hence  the  joint  probability  of  deviations  R^,  R2Q2  •  •  •  is 

“253S  w«(R«Q.)s 

K'e  ,  and  this  is  greatest  when  S in  (R sOs)2  is  least. 

Sns(RsQs)2  =  SJws^s  —  ( axs +  6))2] 

=  S(wsjs2)  +  a2S{nsxs2) 

+  N  b2  —  2aS(nsysXs)  —  2bSn8ys-\-2abSnsXs, 
where  N  =  ji^  -|-  , , .  -|~  its  . 

Here  S n8ys  =  o  =  Sn&Xs  since  the  origin  is  the  double  average. 

S  [nsxs2)  =  Sx2  =  No-*2,  S  (nsysXs)  =  S  xy  =  N><r*<ry 
and  the  expression  equals 

Swsjs2  +  N  (a<rx  —  ^o-y)2  -|-  N62  —  N r2cry2. 

This  is  least  when  b  —  o,  and  a  —  r  —  as  before. 

The  liney=r—  x  is  then  the  most  probable  locus  of 

X 

regression,  if  we  assume  rectilinearity  and  independence  be¬ 
tween  deviation  in  an  array  and  the  corresponding  value  of 
x,  the  deviations  from  rectilinearity  being  due  to  fewness 
of  observations. 

The  value  of  Sm6.(RsQ«)2  reckoned  from  this  line  is 
S  nsys2  —  N r2o-y2. 

If,  however,  there  is  nothing  in  the  genesis  of  the  measure¬ 
ments,  or  in  their  results,  to  justify  the  assumption  of  recti¬ 
linearity,  r  ceases  to  be  an  intelligible  measurement  of  the 
amount  or  degree  of  commonness  of  causation,  though  it  may 
still  be  a  useful  function  of  the  quantities  in  analysis. 


The  Correlation  Ratio. 

To  obtain  a  measurement  completely  independent  of 
assumptions  about  distribution  of  the  observations,  Professor 
Pearson  has  devised  the  correlation  ratio  (. Drapers'  Company 
Research  Memoirs ,  Biometric  Series,  II.,  1905). 

Let  so-y  be  the  standard  deviation  of  the  sth  array,  so  that 
ns .  s<ry2  =  S(y*  —  ys)2,  and  write  ora 2  for  the  weighted  mean  of 
p Ty2,  2<ry 2  ...  so  that  Ncra2  =  S  (n8 .  s<ry2)  =  SS(ys  —  ys)2,  the  inner 
summation  being  extended  over  an  array,  and  the  outer  indicating 
the  sum  over  all  the  arrays. 
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Write  N<7?(i,2  =  S  (n8y62),  so  that  <rm2  is  the  weighted  mean 
square  of  the  averages  of  the  arrays. 

Then  Ntr/  =  S(y2),  the  summation  being  extended  over  all 
values  of  y,  and 

Norft2  =  S  {S  (ys2)  —  2jp«Sy«+ nsys2\  —  S  (Sy«2— nsy$2)  =  S  (y2)  —  S  (nsys2) . 
<ry2  =  <rwa  +  a-a2,  as  is  otherwise  evident. 


Now  write  rj  =  —  =  x/ ( I  —  . (I09) 

(Ty  V  \  (r;/7 

7]  is  then  called  the  correlation  ratio.  It  is  the  ratio  of  the 
scattering  of  the  averages  of  the  arrays  to  the  scattering  of  the 
group  not  regimented  into  arrays. 

7]  =  o  only  if  am  =  o,  and  therefore  if  every  y8  =  o  ;  that 
is,  if  the  average  of  every  array  is  coincident  with  the  general 
average  of  the  group. 

7j  =  i  only  if  aa  —  o,  that  is  if  every  8<jy  —  o,  and  the  terms 
in  each  array  are  concentrated  at  a  single  point,  y8. 

Otherwise  i  >  7]  >  o. 

In  normal  correlation  every  8o-y  =  cr?y(V/( i  —  r 2)  formula  (107), 
and  then  o-a2  =  <ry2(i  —  r 2),  and  ??2  =  y2. 

In  other  cases  we  have,  as  shown  above,  p. 

S«,(R,Q,)2  =  No-,,,2  -  NrW  =  N(t,2  -  r2) cr,/ 


Sws(RsQs)2 

Ncr,2 


(no) 


and  1 7] )  >  r  ,  unless  every  RgQg  is  0  and  the  means  of  the 
arrays  all  lie  on  the  line  y  =  r  ~x. 

&  X 

We  may  now  sum  up  the  treatment  of  correlation  so  far. 
If  (x,y)  is  a  pair  of  measurements  (from  their  averages) 
of  two  variables  (related  in  space,  in  time,  in  a  thing  or  in 
an  organism),  and  if  when  x  is  given  as  positive  (or  negative) 
there  is  a  presumption  that  y  is  positive  (or  negative),  or  a 
presumption  that  y  is  negative  (or  positive),  then  the  variables 

are  said  to  be  correlated.  In  such  a  case  -  S#y  does  not  tend 

n 

to  zero  when  n  is  increased,  but  to  a  limit  written  as  rax(ry. 
r  =  o,  =  1,  =  —  1  have  definite  meanings  ;  r  is  sensitive  to 
all  kinds  of  relationship  between  x  and  y.  In  general  it 
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may  be  expected  to  be  the  greater  as  cra  (the  mean  scattering 
within  the  arrays)  is  less.  If  x  and  y  are  each  the  sum 
of  ft  +  q  independent  elements  of  which  p  (only)  are  common 
to  x  andy,  then  r  equals  p/(p  +q),  if  the  standard  deviations 
of  the  elements  are  equal.  If  x  and  y  are  generated  linearly 
from  a  multiplicity  of  independent  causes  (some  of  them 
common  to  x  and  y),  then  r  defines  the  whole  frequency 
distribution  of  the  pairs,  the  regression  loci  are  rectilinear, 

and  their  equations  are  v  —r  —  .  x,  and  x  —  r  —  .  y.  If  the 

<rx  °y 

normal  frequency  surface  cannot  be  assumed,  but  regression 
is  rectilinear,  the  same  equation  is  a  good  empirical  state¬ 
ment  of  regression.  If  nothing  can  be  postulated  as  to  the 
distribution  of  x  and  y  or  the  averages  of  the  arrays,  the 
meaning  of  the  numerical  value  of  r  is  undefined  (as  is  always 
the  case  with  r\  when  it  is  not  o  or  1).  In  general,  however, 
r  may  be  said  to  measure  the  amount  that  is  common  in  the 
systems  of  causation  of  x  and  y. 

Correlation  between  Ungraded  Variables. 

The  measurement  of  correlation  by  the  methods  so  far 
discussed  is  only  possible  if  we  have  adequate  detailed  observa¬ 
tions.  Cases  of  great  interest  arise  when  such  detail  is  not 
forthcoming. 

Colour  of  Hair. 

Parent. 


Son. 

Light. 

Dark. 

Totals. 

Dark 

a 

b 

nl 

Light 

c 

d 

n2 

Totals 

mx 

m2 

N 

Suppose  that  sons  and  parents  are  separated  according 
to  the  colour  of  their  hair  distinguished  as,  light  or  dark  ;  and 
that  of  m1  sons  of  light-haired  parents  c  have  light  hair  and 
the  remaining  a  dark ;  while  of  m2  sons  of  dark-haired  parents, 
d  have  light  hair  and  the  remaining  b  dark.  Let  a  +  b  =  nv 
c  +  d  =  n2>  and  nx-\-  n2  =  N  =  mx-\-  m2. 

Required  to  determine  from  these  data  whether  there  is 
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a  relationship  between  hair-colour  of  sons  and  parents,  and, 
if  so,  to  measure  it. 

If  in  such  a  case  normal  distribution  of  the  variable  (say, 
amount  of  pigment)  and  normal  correlation  can  be  assumed, 
the  problem  is  determinate.  For  the  ratios  m2 :  N,  and  nx :  N 
give  (by  inverse  use  of  the  table,  p.  271)  the  abscissae  on  the 
scales  of  pigment  which  correspond  to  the  division  between  light 
and  dark  ;  for  any  given  value  of  r  the  fraction  of  the  correla¬ 
tion  surface  bounded  by  planes  through  these  abscissae  is 
known,  and  the  equation  of  the  fraction  b/ N  to  this  is,  con¬ 
versely,  an  equation  for  r. 

The  necessary  analysis  is  given  by  Pearson  (Phil.  Trans.  A, 
Vol.  CXCV,  pp.  1  seq.)  and  Elderton  (Frequency  Curves, 
Chapter  VII)  and  results  in  a  troublesome  equation  for  r 
which  can  be  solved  approximately. 

If  we  have  control  of  the  data  and  can  make  both  separa¬ 
tions  at  the  median,  a  simple  solution  can  be  given. 

Suppose  that  intelligence  in  arithmetic  and  in  algebra 
is  normally  distributed.  Arrange  a  large  class  of  N  boys 
in  order  of  intelligence  (as  known  by  marks  or  otherwise)  in 
arithmetic  ;  now  mark  also  their  order  in  respect  of  algebra, 
and  suppose  that  b  are  found  above  the  median  in  both 
respects,  c  below  in  both,  d  above  in  arithmetic  and  below 
in  algebra,  and  a  above  in  algebra  and  below  in  arithmetic. 
It  is  not  assumed  that  intelligence  is  measured,  but  only  that 
an  order  can  be  assigned. 


Then  a  +  b  = 


N 

2 


c  -f-  d  —  a  -f-  c  =  b  T  d, 


a  =d  =  (i  —  q) N,  say,  and  b  =  c  =  (£  -f-  q) N. 


It  can  be  shown  as  follows  that  r  —  sin  27 rq. 

Take  the  standard  deviation  on  each  scale  to  be  unity. 

Let  the  required  surface  be - .L.  ..  e  ~  %F-~r»/xa  +  y2  ~  ' 

2ttVi  -r2 


The  principal  axis  of  the  surface  then  makes  -  with  the  axis 

of  x.  (£  -f  q)  equals  the  volume  in  the  doubly-positive 
quadrant  bounded  by  the  planes  y  =  o,  x  —  0,  and  these 
planes  cut  off  each  of  the  similar  elliptic  horizontal  sections 
(i  +  q)  of  their  area.  Take  the  ellipse  x2  -f-y2  —  2 rxy  =  1. 
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Hence  in  the  figure 

Elliptic  area  CPO  =  (£  +  #)  of  area  of  ellipse. 

Let  <9  be  the  eccentric  angle  of  P. 

The  ellipse  referred  to  its  principal  axes  is 

(1  —  r)  x 2  -f  (1  +  r)y2  =  1. 

tan  8  major  axis  // 1  -j-  r\ 

,  7 r  minor  axis  “  V  \  1  —  r)  * 

tan  - 

4 

r  =  —  cos  2 6. 

1  _  2  area  CPA  26 

r.  5  ^  area  of  ellipse  —  2-' 

2ttQ  ==  2(9  —  - 
2 

and  sin  2-?r^  =  —  cos  2  6  —  r. 

E.g.,  if  40  %  of  the  boys  were  found  to  be  above  the  median 
in  both 

i  +  q='4>  ?=,I5i  r  =  sinpV73-  =  *8i. 

If  q  —  0,  r  —  o  ;  if  q  =  r  —  1. 

In  the  table  relating  to  83  boys  given  by  Mr.  W.  Brown 
(. Biometrika ,  Vol.  VII.,  p.  366),  11  boys  are  above  the  median 
in  algebra,  but  below  in  arithmetic.  Here  q  =  J  —  =  -12, 

and  r  —  *68.  Mr.  Brown  using  the  complete  order  (and  not 
merely  the  median)  obtains  *65,  and  using  the  marks  obtains  *79. 
All  need  correction,  given  by  Mr.  Brown,  for  age  and  position 
in  school. 

B  B* 
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If  normality  of  distribution  of  the  two  attributes  cannot 
be  postulated,  the  problem  of  measurement  of  the  amount 
of  correlation  becomes  indeterminate,  and  a  number  of  methods 
have  been  tried. 


Association. 


The  expected  number  of  dark-haired  fathers  with  dark¬ 
haired  sons,  if  there  were  no  causal  connection,  would  be 

/i/yi  71/  i 

bf  X  jT  x  N  =  /3,  where  out  of  N  cases  nx  sons  and  m2 
parents  were  dark-haired. 

The  notation  of  p.  367  being  adopted,  and  a,  7,  8  being  the 
number  in  the  a ,  c,  d  compartments  that  would  be  the  most 
probable  in  a  chance  arrangement,  we  have 


m2n1 


a  +  b  —  a  -{-  (3  —  nv  a  +  c  —  a  +  7  =  mx  etc. 
and  a  —  a  —  b  —  /3  =  c  —  7  =  8  —  d  =  b  —  ^ 

b[a  -f-  b  -j-  c  +  di)  —  ( b  d )  ( a  T  6)  be  —  ad 


N 


N 


#N,  say. 


Then  q  is  a  measure  of  association,  but  no  definite  meaning 
can  be  given  to  it  except  in  extreme  cases. 


Instead  of  q,  Mr.  Yule  takes  Q  = 


be  —  ad 
be  +  ad 


(the  “  coefficient 


of  association  ”)  or  co  = 


Vbc  —  V  ad 
Vbc  - f  V ad 


(the 


“  coefficient  of  colli¬ 


gation  ”)  as  measurements.  (See  Introduction  to  Theory  of 
Statistics,  p.  37,  and  Statistical  Journal,  1912,  p.  593.) 
Q  =  o)  =  0,  if  be  =  ad,  and  q  =  o,  the  case  of  no  association  ; 
and  Q  =  w  =  1  if  a  or  d  is  zero,  and  —  1  if  b  or  c  is  zero, 
which  cases  correspond  to  the  maximum  of  association  on 
this  method. 

These  coefficients  have  therefore  definite  meanings  in 
extreme  cases,  but  the  meaning  of  (e.g.)  Q  —  §  can  only  be 
appreciated  by  the  examination  of  numerous  instances,  and 
n  the  end  it  can  hardly  be  affirmed  that  a  greater  Q  means 
a  greater  amount  of  “  association,”  for  no  definite  measurable 
meaning  has  been  given  to  the  term  “  association.” 
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Contingency . 

If,  instead  of  trying  to  find  the  amount  of  association,  we 
ask  for  evidence  of  its  existence ,  that  is  whether  the  observa¬ 
tions  could  arise  if  the  attributes  were  independent,  we  are 
on  surer  ground. 

If  ft  :  1  —  ft  is  the  ratio  of  dark  to  light-haired  among  sons 

'll 

in  general,  then  from  the  observations  ft  —  ^  is  the  best  value 
we  can  assign. 

Hence  the  chance  that,  if  N  sons  were  divided  arbitrarily 
( e.g .  according  as  their  Christian  names  began  with  A  to  K 
or  L  to  Z)  into  two  groups  containing  respectively  m1  and  m2 
of  them,  a  would  be  found  in  the  first  group  is  that  discussed 
above  (p.  282-4),  and  may  be  written 

^  fck  bdng  negleCted} 


where 


x  —  a  —  ftm1  =  a - =  a 


a  =  * 


a2  =  p(l  —  ft)  m1  (1  — 


m 

N 


1  \  _  ni  U2 


Mo 

N  •  N  mi  •  TT 


The  chance  that  so  great  a  deviation,  positive  or  negative, 
as  ( a  —  a)  should  occur  is 

I  -1*2  X 

2  /  -  y _ e  '  dz,  where  z  —  - . 

J  Z  V2tt 

Notice  that 


xi 


cr ‘ 


?2N5  =  ?SN3  _  K  +  "2)  (™1  +  nh) 


nln2mlm2 


n1n2m1m2 


1  1 

T  7TL7“  i~ 


=  q2 N3  ( 

\n1m1  ■  n1m2  ‘  n2mx 

(a -a)2  (  (b-P)2  t  ( c-y )2  ,  (d  — _S)2 

3 


+ 


«2W2 


a 


+ 


P 


+ 


+ 


E.g.  in  the  distribution 


y 

65  235 


X2,  say. 


35  165 

»,  — r  3OO,  =  200,  JW,  =  100,  »(2  =  4OO,  N  =  500, 

<r2  =  19-2,  a  =  60,  X  —  ?N  =  5, 

-  =  1-14,  F  (1-14)  =  -373  (p.  271),  and  2  [J  —  K (1-14)}  =  *254. 

cr 


*  q  has  the  same  meaning  as  in  the  previous  paragraph  and  is  not  i  —  p 
as  in  Chapter  II. 
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The  chance  against  obtaining  65  or  more  or  55  or  less, 
when  100  are  selected  out  of  500,  and  the  chance  in  one  selec¬ 
tion  is  300  :  500,  is  -746  to  *254  or  about  3  to  1. 

Given  nv  mv  N  and  a,  the  remaining  numbers  b,  c,  d,  n2,  m2 
are  known,  and  the  chance  just  found  is  equally  the  chance 
affecting  any  one  of  the  numbers  b,  c,  d  taken  independently 
of  each  other.  It  should  not  be  spoken  of  as  the  chance  of 
the  distribution  as  a  whole  ;  to  find  this  we  should  need  to 

know  the  chance  p  from  a  wider  universe,  and  not  as  ^  deter¬ 
mined  from  a  limited  number  of  observations,  and  also  the 


general  chance  to  which  ~  is  the  approximation. 

To  illustrate  this  difficulty,  we  will  consider  a  problem 
that  has  often  been  discussed. 


Not 

vaccinated. 

Vaccinated. 

Totals. 

Recovered  . 

a 

b 

nl 

Died. 

c 

d 

n 2 

Totals 

.  mx 

m2 

N 

In  an  epidemic  of  smallpox  the  number  of  cases  is  N,  of 
whom  m2  were  vaccinated,  n2  died,  and  other  categories  are 
as  shown  in  the  table. 

The  recovery  rate,  as  shown  by  the  whole  statistics,  is 

and  if  vaccination  (whether  directly,  or  by  the  other  attributes 
correlated  with  it)  had  nothing  to  do  with  recovery,  then  the 
chance  of  a  vaccinated  or  an  un vaccinated  patient’s  recovery 

would  be  and  the  chance  that  as  many  as  b  recover  out 

of  m2  vaccinated  is 

_ y _  0  nin2mlm2  j 

/_  -*  •  o  •  LlAs, 

J  b  -  1 3  v  (27rn1n2m1m2) 

where  =  b  —  ft  =  x ;  if  this  is  small  there  is  evidence  of  a 
relation  between  vaccination  (or  the  circumstances  that  lead 
to  it)  and  recovery. 

The  rate  however,  is  subject  to  a  standard  deviation  of 


THEORY  OF  CORRELATION  373 

Vfi  yi 

-ot,  and  unless  this  is  negligible  its  effect  on  the  computed 
result  should  be  tested. 

A  measure  of  the  advantage  (or  disadvantage)  of  vaccina¬ 
tion  (apart  from  evidence  of  the  existence  of  some  effect) 
could  conceivably  be  obtained  by  comparing  the  recovery 

rates—  and  — ,  but  apart  from  the  statement  of  these  rates 
m2  m1  r 

and  their  standard  deviations  there  is  no  direct  method  of 
procedure. 

The  question  of  the  existence  and  of  the  measurement  of 
association  becomes  more  complicated  when,  instead  of  simple 
alternatives  in  each  attribute,  we  have  several  different  classes, 
for  example  several  grades  of  hair  colour  both  in  father  and  son. 

Professor  Pearson  has  introduced  the  "  coefficient  of 
contingency  ”  for  the  measurement  of  such  a  case  ( Drapers 
Company  Research  Memoir ,  Biometric  Series  I,  1904;  Elderton 
Chapter  X). 

Numbers  of  Observations. 

Classes  of  First  Attribute. 


"d 

c 

o 

u  • 

m  % 

O  ■  £ 

J)  «-» 

<u  pt 


u 


al 

*2 

... 

b  1 

b2 

.  .  .  V 

Cl 

C2 

...  n3 

mx 

tn2 

.  .  .  1  N 

Let  nv  n2  .  .  .,  mv  m2  .  .  .  be  the  totals  of  lines  and 
columns  as  in  the  table,  and  N  the  total  number  of  observa¬ 
tions. 

Then  if  nj N,  mj N  are  assumed  to  be  the  accurate  propor¬ 
tions  of  the  first  class  in  each  attribute  to  the  total,  the  most 
probable  number  to  be  found  in  the  position  of  av  if  there  was 

¥1  *  ¥VL 

no  association,  would  be  ax  =  ^  x  ^  X  N.  Similarly  values 

a2 .  . .,  /9i,  (32 . . .,  yv  y2 . . .  can  be  computed  for  the  other  places. 

The  divergences  ax  —  av  a2  —  a2,  ...  bt  —  /Sj  ...  afford 
some  measure  of  association.  Since  an  excess  or  defect  is 
equally  probable,  it  is  convenient  to  take  the  squares,  (a1—a1)2 
etc.,  instead  of  the  linear  quantities. 

Analogy  with  the  case  of  four  categories  suggests  the 
formation  of  the  function 


(%— a,)2  (tfo—  a2)2 


(ft|-&)8 
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This  function  also  has  a  place  in  the  measurement  of  the 
appropriateness  of  a  formula  to  represent  given  observations 
(formula  (130)). 

The  coefficient  of  contingency  is  then  defined  as 


(112) 


X2 

N 


When  X  =  o  and  there  is  no  association,  C  =  0. 

As  X  increases,  C  increases  from  o,  and  tends  towards  1  as 

becomes  great. 


X2  {a1  —  cq)2  (a2  —  a2)2  ,  1 

~  la — — n_  la - ^L  -f  .  .  and  depends  only  on 

N  n1m1  n1m2  J 

ratios,  not  on  the  whole  number  measured. 

It  can  be  shown  (Elderton,  p.  147)  that  if  the  numbers 
av  a2  .  .  .  b1  etc.  are  those  which  occur  in  appropriate 
divisions  of  a  normal  correlation  surface,  and  if  the  number  of 
divisions  is  large,  while  N  is  so  large  that  the  smallest  of  the 
entries  is  not  less  than  a  small  integer,  then  C  approximates 
to  r,  the  coefficient  of  correlation.  This  relation  appears  to 
have  suggested  the  form  of  the  function  of  X2  which  defines  C. 

The  value  obtained  for  C  differs  according  to  the  number 
of  divisions  taken,  and  this  consideration  diminishes  its  utility 
as  a  measurement ;  but  a  method  has  been  given  by  which  this 
difficulty  can  be  overcome  ( Biometrika ,  Vol.  IX,  pp.  116-139). 

The  significance  of  particular  values  of  C  can  only  be 
appreciated  by  experience  of  many  cases. 

It  should  be  noticed  that  C,  and  the  analysis  of  the  previous 
paragraph,  can  be  applied  to  cases  where  classes  can  be  defined, 
but  have  no  measurable  attributes. 


Correlation  of  Time  Series. 

So  far  we  have  been  concerned  with  the  correlation  between 
two  statistical  groups  where  the  measurements  all  relate  to 
the  same  time  ;  we  have  still  to  consider  how  to  test  the 
relation  between  two  series,  where  the  pairs  xv  y±  .  .  .  xp,yp  .  .  . 
%u  yt  are  measurements  of  quantities  at  successive  intervals. 
Here  it  is  generally  the  case  that  one  value  of  x  is  not  inde¬ 
pendent  of  those  that  come  before  or  after  it,  and  the  relation¬ 
ship  found  between  x  and  y  may  merely  reflect  a  general  or 
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periodic  progress  in  time  and  not  any  more  intimate  connec¬ 
tion.  Nearly  all  series  in  time  have  a  trend,  and  these 
trends,  whether  equally  rapid  or  not,  or  in  the  same  direction 
or  not,  will  yield  a  high  correlation  coefficient  even  if  the 
quantities  are  otherwise  independent.  For  example  the 
coefficient  between 

i,  2,  3 . 20 

ioo,  98,  96 . 62 

where  xt  =  t  and  yt  =  100  —  2  (t  —  1),  is-i. 

We  need  to  find  the  correlation  between  the  deviations 
after  the  time  element  is  eliminated. 

One  method  *  is  to  obtain  smoothed  lines  for  each  quantity, 
as  above  (pp.  132  seq.),  to  compute  the  differences  between 
the  observations  of  each  year  and  the  values  given  by  the 
smoothed  line,  and  to  treat  these  differences  as  the  quantities 
whose  correlation  should  be  measured  ;  i.e.  to  measure  the 
correlation  between  such  quantities  as  those  represented  on 
the  diagram  (facing  p.  155). 

If  the  series  are  markedly  periodic,  the  result  would  be  only 
to  bring  out  the  correlation  due  to  the  periodicities,  and  these 
are  better  studied  by  harmonic  analysis.  And  if  the  series 
are  strongly  “  compensated  ”  (p.  148),  so  that  a  positive  devia¬ 
tion  is  generally  followed  by  a  negative  one,  the  correlation 
would  reflect  this  symptom. 

But  if  the  oscillations  are  random,  so  that  apart  from  a 
regular  trend  the  measurement  of  one  year  is  unrelated  to 
the  measurement  of  adjacent  years,  the  coefficient,  calculated 
between  the  deviations  from  the  smoothed  lines,  measures  the 
same  kind  of  relation  as  that  already  discussed  in  the  correla¬ 
tion  of  groups. 

Let  xp,  yp  be  a  pair  of  measurements  in  the  pth  year,  and 
let  Xp,  j/p  be  the  averages  of  the  measurements  of  m  years  of 
which  the  pth  is  central,  m  being  odd.  Then  the  correlation 
coefficient  to  be  calculated  is  that  between  xp  —  xp  and  yp  —  yp. 

The  method  can  also  be  applied  if  the  smoothing  is  effected 
by  the  method  recommended  by  Professor  Persons  ( The  Review 
of  Economic  Statistics,  No.  1,  1919,  Harvard  University  Press). 
In  this  method  the  average  y  is  calculated  for  m  years  during 


* 


See  Hooker,  Statistical  Journal,  1901,  pp.  485  seq. 
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which  the  trend  appears  to  be  in  one  direction  and  without 
any  marked  change  of  gradient,  and  the  smoothed  line  is 
assumed  to  be  of  the  form  y  —  y  —  kt,  where  t  is  the  number 
of  years  from  the  centre  of  the  period,  k  is  determined  by 
the  condition  that  the  sum  of  the  squares  of  the  deviations 
from  this  line  shall  be  a  minimum,  viz.,  that  S{y*  —  (f  -f  kt)}2 
is  a  minimum,  where  yt  is  the  observation  t  years  from  the 

S  tyt 


centre.  Then  k 


St2’ 


T£  i  ,  c,?  /  2  ,  .  n(n  -j-  i)  (2 n  + 1) 

If  k  =  2 n  -f  i,  St 2  =  2(i2  +  .  .  .  -f  n2)  =  — - — — — - 


This  method  overcomes  partially  a  difficulty  that  occurs 
when  moving  averages  are  used  in  a  case  where  the  trend  is 
continually  concave  (or  convex),  and  the  averages  always 
below  (or  above)  the  observations. 

Another  method,  introduced  by  Miss  F.  E.  Cave  (Royal 
Society’s  Proceedings,  Vol.  LXXIV,  p.  407,  1904)  and  by  Mr. 
Hooker  ( Statistical  Journal,  1905,  pp.  696  seq.)  and  more 
recently  developed  by  Professor  Karl  Pearson,  Miss  B.  M. 
Cave  and  others,  is  to  correlate  not  the  observations  but  the 
differences  between  successive  observations.  A  period  of 
m  -f-  1  years  is  selected,  in  which  the  observations  are 
xQ,x1  .  .  .  xm  and  y0,y1  .  .  .  ym,  and  the  coefficient  of  correla¬ 
tion  between  the  pairs  xx—  x0,  y±  —  y0>  x2~~  xv  Yi  •  •  • 
xm  —  xm-!>  ym  —  ym- 1  is  calculated. 

Since  the  average  of  the  quantities  xx  —  x0,  x2  —  xx  .  .  . 

is  —  (xm  —  x0),  the  deviations  from  the  average  which  alone 

rtV 

enter  into  the  formula  for  r  are  the  excess  (or  defect)  of  the 
increment  in  a  particular  year  over  the  mean  increment,  or 
they  may  be  described  as  the  annual  variations  from  the  trend. 
If  the  smoothed  lines  of  a  and  y  are  markedly  concave  or 
convex  the  correlation  will  be  dominated  by  this  symptom, 
but  if  the  observations  oscillate  in  an  irregular  way  about  a 
straight  line,  we  shall  obtain  a  measure  of  correlation  inde¬ 
pendent  of  the  element  of  time. 

To  get  over  the  difficulty  arising  from  concavity  or  con¬ 
vexity  a  more  elaborate  method,  named  by  Professor  Pearson 
that  of  “  variate  difference  correlation,”  has  been  devised.* 


*  Biometrika,  Vol.  X,  pp.  179  seq.  and  pp.  340  seq. 
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This  is  based  on  the  assumption  that  xp  can  be  expressed  as 
xp  =  Xp  +  btp  -f  dp2  -f  .  .  .  ,  where  Xp  is  independent  of  the 
influence  of  time,  and  the  effect  of  the  time  can  be  expressed 
as  a  parabolic  function :  and  similarly  yp  =  Yp  +  b'tp  +  .  .  . 

%p+i  —  —  Xp+i  —  Xp  -f-  b  T-  o  +  i)  +  .  .  . 

Mr.  Hooker’s  method  ignores  c  and  further  constants. 

The  second  difference  gives 

•Tp+i  2 Xp  "T  i  ==  -Xp-f-j  2^X2 1  ~f"  -Xp  _ j  T“  2c  ~f~  fodtp  •  •  • 

and  its  use  ignores  d  etc.,  assuming  a  strict  parabolic  form. 

It  can  be  shown  that  when  time  is  eliminated  the  correlation 
between  any  differences  equals  the  correlation  between  Xp  and 
Yp.  The  process  is  complete  when  the  correlation  coefficient 
is  no  longer  affected  by  proceeding  to  a  further  difference. 

There  is,  however,  a  great  difficulty  in  applying  this  method 
to  any  differences  except  perhaps  the  first  three,  owing  to 
the  want  of  precision  or  the  small  number  of  significant  figures 
in  ordinary  observations.  The  effect  can  be  seen  if  we  take 
the  squares  of  the  numbers  2*6,  2*7  .  .  .  when  written  only 
to  the  first  decimal  place. 

6-8 

7-3 

7- 8 

8- 4 

9- 0 

9*6 

IO-2 

10-9 

The  second  differences  if  written  completely  are  all  *02. 

The  method  is,  in  fact,  too  refined  for  ordinary  statistical 
observations. 

The  difference  between  the  methods  may  be  exhibited  as 
follows. 

The  x  quantity  that  is  correlated,  if  we  use  the  fourth 
difference,  is  6{x0  +  \{x2  +  *_2)  —  f (xx  +  #_]_)},  where  the 
suffixes  mark  the  distance  to  right  or  left  of  the  centre  ; 
here  the  extreme  terms  increase  the  expression. 

If  we  take  the  moving  average  based  on  five  terms,  the 
quantity  is 

•r0-R*-8+*-l+*0+*l+*2)  =$(X0-UX2+X-t)~i(%l+X-l)) 

and  the  extreme  terms  diminish  the  expression. 
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The  eighth  difference  gives 

X0  ~~  i(Xl  +  x-l)  4*  i(x2  +  x-2)  —  -Sj(x3  4*  X-z)  +  TT)(^4  4"  X-f) 
while  the  moving  average  gives 

X0  iK%  +-^-l)  i{X2  "h  X-f)  4(^3  4"  x-z)  ■§■(•^4  4-  x-i)> 
On  the  other  hand,  the  second  difference  gives 

xo  i  (xi  4*  v0  +  %-■]), 

and  is  the  same  as  that  obtained  from  a  moving  average  based 
on  only  three  terms,  and  therefore  subject  to  a  very  considerable 
chance  error. 

The  various  methods  need  further  examination  and  more 
experience  of  their  applicability.  It  appears  that  the  moving 
average  does  not  give  the  right  importance  to  extreme  terms, 
while  the  difference  method  is  too  sensitive  to  the  effect  of 
roughness  in  observations.  In  either  case,  the  resulting 
measurement  of  correlation  depends  on  the  assumptions 
made,  and  is  not  so  easily  intelligible  as  in  the  measurement 
of  correlation  of  groups. 


Graphic  Comparison  of  Series. 

Apart  from  the  determination  of  a  measurement  of  correla¬ 
tion,  the  problem  arises  of  how  best  to  exhibit  the  relationship 
graphically. 

The  following  method  is  useful.  Let  xv  x2  .  .  .  xn>  yv  y2  .  . .  yn 
be  deviations  from  moving  averages  (as  in  the  table  on  p.  387), 
or  (if  there  is  no  trend)  be  actual  measurements,  and  let  x  andj 
be  their  averages. 

Make  a  graph  of  the  values  of  y  on  any  convenient  scale, 
time  being  measured  horizontally.  Then  the  a  values  may  be 
placed  on  this  diagram  on  any  scale  and  with  any  origin. 

Take  b  as  the  origin  for  x,  and  let  1  unit  of  a  correspond 
to  c  units  of  y.  b  and  c  are  to  be  chosen.  A  convenient  method 
is  to  select  them  so  that  the  sum  of  the  squares  of  the  vertical 
distances  between  the  points  representing  pairs  such  as  xv  yx 
shall  be  a  minimum  (Appendix,  Note  10). 

That  is  S{c(a  +  b)  —  y}2  is  to  be  a  minimum. 

By  differentiating  with  regard  to  b  and  to  c,  it  is  found  that 


c  — 


S(x  —x)(y  —y) 


n<r  1 


and  c[x  +  b)  =  y,  where  cr1  is  the  standard 


deviation  of  the  a's. 
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The  averages  of  the  deviations  should  therefore  be  marked 
at  the  same  point  on  the  vertical  scale,  and  the  differences 

from  their  average  of  the  x’s  should  be  multiplied  by  and 

then  measured  on  the  y  scale  above  and  below  the  average,  <r2 
being  the  standard  deviation  of  the  y  s  and  r  the  coefficient  of 
correlation. 

An  example  is  given  in  measuring  unemployment  in  the 
Statistical  Journal,  1912,  pp.  799-800. 
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In  this  section  the  results  of  several  experiments  and 
observations  are  given  to  illustrate  the  theory  discussed  above, 
and  to  show  the  arithmetical  working  of  the  measurements. 

It  should  be  premised  that  the  theoretical  value  of  r  would 
only  be  obtained  exactly  in  an  infinite  number  of  observations. 
It  is  shown  in  the  chapter  on  probable  errors  that  r,  as  calcu¬ 
lated  from  n  pairs,  may  differ  from  its  true  value  by  an  amount 
whose  standard  deviation  measured  on  the  normal  scale  of 

Thus  in  the  first  example  the  correlation 


error  is 


I  —  Y 


Vn 


coefficient  is  known  to  be  -6  ;  24  pairs  are  taken,  and  we  should 

1  —  *62 

expect  to  be  within  -  —  *13  of  -6,  while  it  is  very  unlikely 

that  the  difference  would  amount  to  3  times  *13.  Conversely, 
if  we  do  not  know  the  coefficient  a  priori,  we  must  read  with 


our  calculated  value 


1  —  ri 

Vn 


Some  of  the  examples  are  intended  to  show  simply  the 
arithmetical  methods  of  working  out  r  from  the  observations. 

In  others  when  the  observations  are  numerous  the  averages 
of  arrays  are  obtained  and  comparison  is  made  with  the 


equation  y  —  y  —  r  —  (x  —  x),  which  is  the  locus  of  these 

averages  if  regression  is  rectilinear. 

In  the  final  example  the  distribution  of  1,000  pairs  is 
compared  in  detail  with  the  distribution  given  by  the  theoretical 
correlation  surface. 

In  general  x  and  y  are  measured  not  from  their  averages 


but  from  an  arbitrary  origin  and  then  r  ==• 


by  formula  (93). 
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Example  1. — To  obtain  a  simple  illustration  ot  correlation 
when  all  the  circumstances  were  known  and  the  coefficient 
could  be  stated  a  priori,  digits  were  taken  from  a  mathematical 
table  at  random.  xt  was  taken  as  the  sum  of  5  digits,  and  yt 
also  as  the  sum  of  5  digits  of  which  3  were  included  in  the 
5  which  made  xt  and  2  were  different,  and  24  pairs  {xxyP)  .  .  . 
(xtyi)  .  .  .  were  formed.  The  correlation  coefficient  for  such 
pairs  is  f  (formula  (96)).  In  the  example  in  which  only  24 
pairs  were  taken  it  was  *537  ;  the  standard  deviation  of  the 

coefficient  4  is  - — =  -13,  so  that  the  deficit  from  so  small 

V24 

a  number  is  not  remarkable. 

The  following  table  shows  the  working. 


X 

y 

X2 

y 2 

xy 

22 

32 

484 

1,024 

7°4 

27 

27 

729 

729 

729 

12 

19 

144 

361 

228 

21 

30 

44 1 

900 

630 

21 

26 

441 

676 

546 

27 

26 

729 

676 

702 

23 

25 

529 

625 

575 

17 

22 

289 

484 

374 

25 

23 

625 

529 

575 

n  —  24  x  = 

II 

9 

121 

81 

99 

240-!*=  I37°6 

l6 

24 

256 

576 

384 

<tj  =  6-i8 

20 

28 

400 

784 

560 

^2  =  5*36 

37 

29 

1,369 

841 

1,073 

1 3354 

33 

25 

1,089 

623 

825 

Y  ^  1 

18 

20 

324 

400 

360 

24 

24 

26 

576 

676 

624 

=  -537 

22 

17 

484 

289 

374 

17 

16 

289 

256 

272 

32 

27 

1,024 

729 

864 

29 

29 

841 

841 

841 

26 

17 

676 

289 

442 

27 

20 

729 

400 

54° 

26 

26 

676 

676 

676 

21 

17 

441 

289 

357 

554 

560 

13,706 

13.756 

13.354 

tV 


24 


v  =  231 
(23r\)a 


24  xy 


The  arithmetic  is  simpler  if  a  and  y  are  measured  from  an 
origin  23  in  each  case. 

Example  2.— Where  we  have  few  and  sporadic  observations, 
it  is  simpler  to  work  out  the  arithmetic  in  full.  For  example 
the  infantile  mortality  in  26  towns  is  in  the  adjoining  table 
compared  with  the  population  (to  the  nearest  1,000)  of  these 
towns,  r  is  only  twice  its  standard  deviation,  and  its  exact 
value  is  therefore  uncertain,  but  there  is  evidence  that  the 
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larger  the  towns  the  higher  the  mortality.  To  attack  the 
question  of  the  causes  of  infantile  mortality  seriously,  it  would 
of  course  be  necessary  to  take  many  more  instances  and  to 
consider  many  other  factors  besides  crude  population. 


Population  and  Infantile  Mortality  in  26  Towns. 


Population.  Mortality. 


X 

000. 

y 

xy 

55 

162 

8,910 

39 

201 

7.839 

36 

241 

8,676 

35 

162 

• 

5,670 

3i 

179 

5.549 

30 

i74 

5,22° 

27 

176 

4.752 

24 

208 

4.992 

24 

163 

3.912 

23 

206 

4.738 

22 

172 

3.784 

20 

200 

4,000 

19 

218 

4.142 

19 

198 

3>762 

19 

132 

2,508 

16 

i55 

2,480 

15 

148 

2,220 

15 

220 

3.3oo 

15 

141 

2,115 

12 

169 

2,028 

7 

155 

1,085 

6 

129 

774 

6 

167 

1,002 

5 

150 

750 

5 

171 

855 

4 

161 

644 

r 


S  xy  —  nxy 

26  (Tx(T2 


where 

n  —  26 
x  =  20-354 

v  =  175*31 


=  -34 

Standard  deviation  of  r  — 


1  ~  *342 
V26 


Totals  .  529  4.558  95,707 


Averages  20-354  175-31  — 

Also  cr1  =  12- 1  <r 2  =  27-9 

Example  3. — A  good  illustrative  example  of  method  is 
obtained  from  statistics  arising  from  the  North  Sea  Fisheries 
Investigation  of  the  size  of  herrings  in  relation  to  the  rings 
which  appear  on  their  bodies  and  which  are  believed  to  show 
their  age,  one  ring  being  formed  each  year. 

The  averages  of  arrays  lie  very  near  the  theoretic  straight 
line  of  regression,  in  spite  of  the  skewness  of  the  original 
curves. 

In  the  table  the  size  is  measured  on  the  axis  of  y,  with 
origin  at  31  cm.  and  unit  1  cm.,  and  the  number  of  rings  is 
measured  on  the  axis  of  a  with  origin  at  7  rings  and  unit  1 
ring,  and  the  numbers  of  cases  are  entered  in  a  square  table. 
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n2  is  the  total  of  cases  for  a  given  value  of  y,  and  n2y  and 
n2y2  and  their  sums  are  obtained  in  the  last  two  columns,  which 
lead  to  the  average  and  standard  deviations  of  the  rings. 
Similarly  nx  is  the  total  of  cases  in  an  a  array,  and  the  sums 
of  nx%  and  nxx2  lead  to  the  average  and  standard  deviation  of 
the  sizes. 

In  the  last  line  the  average  in  each  array  is  given,  obtained 
in  each  a  array  by  multiplying  the  numbers  of  cases  by  the 
corresponding  values  of  y. 

Underneath  each  number  of  cases  is  given  in  brackets  the 
corresponding  value  of  a  x  y  ;  thus  in  the  column  under 
a  =  —  1  in  the  row  y  =3,  we  have  four  cases  and  Ay  =  —3, 
so  that  the  contribution  of  these  four  cases  to  the  sum  of  Ay 
is4X— 3  =  — 12.  The  various  terms  thus  contributed  are 
shown  below  grouped  in  the  four  quadrants. 

The  origins  are  so  chosen  as  to  include  as  many  zero  terms 
in  S Ay  as  possible.  (Compare  Yule,  Theory  of  Statistics,  p.  183.) 


Herring.  Number  of  Rings  and  Size  (Length  in  Centimetres). 


Number  of 

rings 

• 

4 

5 

6 

7 

8 

9 

10 

II 

12 

13 

Totals. 

X 

—  3 

—  2 

—  1 

0 

z 

2 

3 

4 

5 

6 

«2 

n2y 

«2y2 

Size. 

cm. 

y 

35 

4 

— 

— 

z 

— 

z 

2 

2 

— 

— 

— 

6 

24 

96 

(-4) 

(4) 

(8) 

(12) 

34 

3 

— 

1 

4 

4 

15 

14 

7 

3 

z 

z 

50 

150 

450 

(-6) 

(-3) 

(3) 

(6) 

(9) 

(12) 

(i5) 

(18) 

33 

2 

I 

z 

1 1 

26 

26 

22 

11 

3 

3 

z 

105 

210 

420 

(-6) 

(-4) 

(-2) 

(2) 

(4) 

(6) 

(8) 

(10) 

(12) 

32 

z 

z 

24 

49 

53 

26 

7 

5 

z 

— 

— 

166 

166 

166 

(-3) 

(-2) 

(-I)' 

(I) 

(2) 

(3) 

(4) 

— 

— 

31 

0 

— 

28 

43 

45 

21 

6 

2 

— 

— 

. — 

145 

0 

0 

30  - 

I 

z 

15 

21 

16 

7 

z 

— 

— 

— 

— 

61 

—  61 

61 

(3) 

(2) 

(1) 

(-D 

(-2) 

29  - 

2 

2 

3 

5 

10 

—  20 

40 

(6) 

(4) 

(2) 

63 

28  - 

3 

z 

Z 

3 

2 

— 

— 

— 

— 

— 

— 

7 

—  21 

(9) 

(6) 

(3) 

48 

27  - 

4 

— 

3 

— 

— 

— 

— 

— 

— 

— 

— 

3 

-- 12 

(8) 

26  — 

5 

— 

z 

— 

— 

— 

— 

— 

— 

— 

— 

z 

~5 

25 

(10) 

Totals 

«1 

6 

77 

137 

146 

96 

52 

27 

7 

4 

2 

554 

43i 

1,369 

tliX 

-18 

—  154 

-137 

O 

96 

104 

8l 

28 

20 

12 

S n\x  - 

=  32 

vtl*2 

54 

308 

137 

O 

96 

208 

243 

112 

100 

72 

Snix2 

=  1330 

Averages 

in 

30-85 

3I-65 

arrays 

• 

30-17 

31-34 

32-25 

32-92 

33-07 

33-3 

.r  =  32  -f-  554  =  -0578.  Average,  7-0578  rings. 
0T2=  1330  H-  554  —  -0578s  —  -083*  o-i  =  1-521. 
y  =  431  -f-  554  =  '778.  Average,  31-778  cm. 
o-22  =  1369  -f-  554  -  -7?82  —  -083*  tr2  =  1-335- 


*  Sheppard’s  corrections,  Appendix,  Note  5. 
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ELEMENTS  OF  STATISTICS 


S  xy  ++  —  +’ 

4  52  3  7 

ib  88  30  2 

24  66  21 

45  24  12  9 

84  30  12 

63  12  10 

36  26  9 

15  14  6 

18  15  9 

4  24 

10 


636  146  154 

Length  —  31-778  cm.  Number  of  rings  —  7-0578 

-  =  *  528 - 

1-335  cm.  i*52i 


Number  of 

Length  deduced 

Average  < 

rings. 

from  equation. 

arrays. 

cm. 

cm. 

4 

30-36 

30-17 

5 

30-82 

30-85 

6 

31-29 

31-34 

7 

31-75 

31-65 

8 

32-21 

32-25 

9 

32-68 

32-92 

10 

33-14 

33-07 

11 

33-60 

33-3 

-  + 
4 
6 
12 
6 
4 

2 

3 

48 

49 


S.vy  —  636  -f  146  —  y  —  154  =  619 


r  — 


Sxy—  554  xy 


—  .528 

554  cri<r-2 

Standard  deviation  of  r 


026 


Example  4. — The  following  example  is  given  to  illustrate 
the  value  that  may  be  obtained  for  r,  when  in  the  nature  of 
the  case  there  can  be  little  correlation.  For  x  the  last  digit 
of  each  of  the  (7  figure)  logarithms  of  the  numbers  2500-2549 
and  2600-2649  was  taken  ;  for  y  the  last  digit  of  the  logarithm 
of  a  number  50  greater  than  x,  i.e.  2550-2599  and  2650-2699. 
r  is  found  to  be  *086,  which  is  less  than  its  standard  deviation 
for  100  pairs. 

At  the  same  time  is  shown  an  alternative  method  of  setting 
out  the  arithmetic,  which  in  some  cases  is  simpler  than  the 
other  methods  used  in  this  section. 

This  method  leads  readily  also  to  the  calculation  of  the 
correlation  ratio. 


Occurrences  of  Pairs  of  Digits. 


y 


X 

O 

I 

2 

3 

■  4 

5 

6 

7 

8 

9 

n$ 

S y 

xSy 

n  v  2 

0 

— 

— 

1 

— 

1 

1 

I 

— 

— 

4 

8 

53 

O 

351 

1 

3 

— 

— 

1 

1 

— 

— 

1 

I 

1 

8 

3i 

31 

120 

O 

— 

2 

1 

— 

3 

1 

— 

— 

— 

1 

8 

30 

60 

1 12 

3 

2 

— 

— 

1 

— 

1 

— 

2 

2 

3 

11 

65 

195 

384 

4 

1 

2 

2 

— 

— 

2 

I 

3 

1 

— 

12 

5i 

204 

217 

5 

1 

1 

— 

— 

1 

1 

4 

1 

— 

9 

4i 

2°5 

187 

6 

1 

1 

1 

2 

— 

1 

— 

2 

1 

9 

36 

216 

144 

7 

3 

3 

2 

1 

1 

3 

1 

— 

3 

1 

18 

6S 

476 

257 

8 

— 

2 

— 

3 

2 

1 

— 

— 

2 

1 

11 

49 

392 

21s 

9 

— 

— 

-  — 

1 

— - 

— 

1 

1 

3 

6 

45 

4°5 

338 

1 1 

1 1 

7 

9 

9 

11 

7 

10 

11 

H 

100 

469 

2,184 

2,328 

EXAMPLES  OE  CORRELATION 


v  =  4-72  crx  =  2-69  y  =  4-69  (Ty  =  3-03  n  =  100 
S(x—x)(y  —y)  =  Svy  —  1  ooxy  =  S  ( xSy )  —  iooa+  =  2184  —  2114  =  70 
70 

y  = - =  -086.  Standard  deviation  of  r  is  *i. 

1 00(T%r(T  y 

Here  ns  is  the  number  of  times  the  various  a  digits  o,  i  .  .  . 
were  found.  Sy  is  the  sum  of  the  corresponding  y’ s ;  e.g.  in 
the  first  line  we  have  2  +4+5  +6 +9  X  4=53.  y%  is  the  average 
y  for  a  given  a,  and  equals  Sy  -4-  ns ; 

and  nsys2  =  (Sy)2  +  ns. 

The  correlation  ratio  is  found  from  the  last  column  (see  p.  366). 

ioo<rm2  =  S ns(vs  —  y) 2  *  =  S iisys2  —  2 ySntyx  +  nf2 

=  S  n.y,2  —  nf2  =  2328  —  2200  =  128. 

v  —  —  =  =  -37,  and  has  a  considerable  value  though  the 

<ry  3*03  .  . 

correlation  coefficient  is  insignificant. 


Example  5. — The  following  table  gives  data  from  “  The 
Report  on  Heights  and  Weights  of  New  York  City  Children  ” 
for  3,405  boys  aged  14-15. 


Height. 

Number. 

Average 
weight  as 
in  report 

— 

— 

Average 
weight  from 
equation. 

— 

Origin 

61  inches. 

.V 

Origin 

100  lbs. 

J's 

H>y, 

Origin 

100  lbs. 

—  12 

I 

—  12 

—  12 

144 

—  49*6 

144 

-  9 

I 

—  20 

—  20 

180 

-36-6 

400 

-  7 

13 

-l8 

-  234 

1,638 

-27-9 

4,212 

—  6 

59 

-19 

—  I,I2I 

6,726 

-23-6 

21,299 

—  5 

96 

-17 

-I+32 

8,160 

— 19-2 

27,744 

-  4 

190 

-I4 

—  2,660 

10,640 

—  14-9 

37,240 

-  3 

283 

—  12 

-3.396 

10,188 

—  io-5 

40,752 

—  2 

349 

-  8 

—  2,792 

5,584 

—  6*2 

22,336 

—  1 

44° 

-  3 

—  1,320 

1,320 

-  1-9 

3,960 

0 

434 

+  2 

4-  868 

0 

+  2-5 

1,736 

+  1 

400 

+  7 

4-2,800 

2,800 

4-  6-8 

19,600 

+  2 

355 

+  11 

+3,905 

7,810 

4-11-2 

42,955 

+  3 

307 

+  17 

+5,219 

15,657 

+  15-5 

88,723 

+  4 

200 

+  20 

4-4,000 

16,000 

4-20-0 

80,000 

+  5 

137 

T  24 

+3,288 

16,440 

4-24-2 

78,912 

+  6 

78 

+30 

+  2,34° 

14,040 

4-28-6 

70,200 

+  7 

34 

+  35 

4-1,190 

8,330 

+32-9 

41,650 

+  8 

15 

+  34 

+  5io 

4,080 

+37-2 

17,340 

+  9 

6 

+  42 

+  252 

2,268 

+  41'6 

10,584 

4-i° 

7 

+  42 

+  294 

2,940 

+  45-9 

12,348 

Totals  . 

3>4°5 

— 

TL479 

134,945 

— 

622,135 

*  On  p.  366  the  y’s  are  measured  from  their  average, 
to  subtract  y  throughout. 


Here  it  is  necessary 

c  c* 


ELEMENTS  OF  STATISTICS 


x  =  *2270  04  =  2-y9.  1 

V  =  3’371  o-2=  16-3.* 


Average  61-227  inches. 
Average  103-37  ^s- 


Sxy  —  nvy  _  / 134945 


«<ri<r2  \  34°5 


-r  o-ifl-a  =  *797 


The  regression  equation  is 


or  Weight  =  103.4  -f  4-345  (Height  —  61.23). 

The  weights  obtained  from  this  equation  are  given  in  the 
sixth  column  of  the  table  and  should  be  compared  with  average 
weights  corresponding  to  various  heights  given  in  the  third 
column.  The  agreement  is  close  from  about  57  inches  to 
70  inches  ;  but  below  57  inches  actual  weights  do  not  fall  off 
so  rapidly  as  in  the  formula.  The  regression  is  not  in  fact 
linear  for  low  statures. 


3405 =  S njs2  -  340 5j"2,  and  am  =  13*1 
V  =  T  ffo  =  -8l 


Here  the  correlation  ratio  is  practically  the  same  as  the 
correlation  coefficient. 

Example  6. — The  methods  discussed  on  pp.  374-8  for 
measuring  the  correlation  between  two  time-series  are  worked 
out  by  comparing  the  value  of  imports  into  the  United 
Kingdom  per  head  of  the  population  with  the  marriage  rate  in 
England  and  Wales,  year  by  year. 

x  is  the  excess  of  the  imports  in  any  year  over  the  average 
of  the  five  years  of  which  the  year  in  question  is  central,  y  is 
similarly  obtained  from  the  marriage  rate. 

X  =  —  -62,  y  =  —  -3,  a1  =  36-9,  a 2  =  3-61,  Sxy  =  4309,  n  =  50 


4309-^50-  ~3  of  -62 
36-9x3-61 


This  is  the  measurement  of  the  correlation  by  the  use  of 
moving  averages. 


*  Calculated  from  data  not  reproduced  here. 


EXAMPLES  OF  CORRELATION 


Ytars. 

Imports  per  head. 

Annual.  5  years'  Deviation 
average.  x 

Marriage 

Annual. 

Rate,  England  and 
Wales. 

5  years’  Deviation 
average.  y 

xy 

1 

T 

1845 

£3-30 

— 

— 

17-2 

— 

— 

— 

— 

1846 

3*15 

— 

— 

17-2 

— 

— 

— 

— 

1847 

3*21 

3-22 

—  1 

15-8 

i6*5 

-7 

7 

— 

1848 

2*91 

3*35 

-44 

15-9 

16*5 

-6 

264 

> — 

1849 

3'52 

3-55 

—  3 

1 6- 2 

16-5 

-3 

9 

- — 

1850 

3-97 

3’78 

+  19 

17-2 

16-9 

+  3 

57 

— 

1851 

4-14 

4-30 

-16 

17*2 

17-2 

0 

0 

— 

1852 

4*35 

4-7° 

-35 

17-4 

17-4 

0 

0 

— 

'*53 

5-5i 

4'93 

+  58 

17-9 

17-2 

+  7 

406 

* — 

1854 

5-51 

5*34 

+  17 

17-2 

17-1 

+  1 

17 

— 

1855 

5-16 

5-80 

—  64 

16-2 

16-9 

-7 

448 

— 

1856 

6-i6 

5-86 

+  30 

16-7 

16-5 

+  2 

60 

— 

1857 

6-66 

6-oi 

+  65 

16-5 

i(3-5 

0 

0 

— 

1858 

5'8o 

6-44 

-64 

16-0 

16-7 

-7 

448 

— 

1859 

6-26 

6-71 

-45 

17-0 

16-6 

+  4 

— 

180 

i860 

7-32 

6-92 

+  4° 

17-1 

16-5 

+  6 

240 

■ — 

1861 

7*50 

7-45 

+  5 

16-3 

16-7 

-4 

— 

20 

1862 

7.72 

8- 05 

-33 

16-1 

16-7 

-6 

198 

— 

1863 

8-45 

8-40 

+  5 

16-8 

16-8 

0 

0 

— 

1864 

9-26 

8-86 

+  4° 

17-2 

17-0 

+  2 

80 

— 

1865 

9-06 

9-12 

-  6 

17-5 

17-1 

+  4 

— 

24 

1866 

9-80 

9-35 

+  45 

17 -5 

17-0 

+  5 

225 

— 

1867 

9-05 

9-41 

-36 

16-5 

16-7 

_ _ 2, 

72 

— 

1868 

9-60 

9-54 

+  6 

i6-i 

16-4 

—3 

— 

18 

1869 

9*54 

9-68 

-14 

15-9 

16-3 

-4 

56 

— 

1870 

9-7° 

10-09 

-39 

1 6- 1 

16-4 

-3 

117 

— 

1871 

10-49 

10-48 

+  1 

16-7 

16-7 

0 

0 

— 

1872 

11-13 

10-85 

+  28 

17*4 

17-0 

+4 

112 

— 

1873 

n-54 

11-19 

+  35 

17-6 

17-1 

+5 

175 

— 

1874 

n-39 

n-35 

+  4 

17-0 

17-0 

O 

0 

- — 

1875 

n-39 

ii-47 

-  8 

16-7 

16-7 

0 

0 

— 

1876 

11-30 

n-34 

-  4 

16*5 

16-2 

+3 

— 

12 

1877 

n-75 

11-18 

+  57 

15-7 

15-7 

0 

0 

— 

1878 

10-87 

11-28 

-41 

15-2 

15-3 

-1 

4i 

— 

1879 

10-59 

11-29 

-70 

14-4 

i5-i 

-1 

490 

— 

1880 

n-88 

11-29 

+  59 

14-9 

15-0 

—  1 

— 

59 

1881 

n-37 

11-52 

-15 

I5‘I 

I5-I 

0 

0 

— 

1882 

n-73 

n-59 

+  14 

15-5 

15-2 

+3 

42 

— 

1883 

12-04 

11-27 

+  77 

15-5 

I5‘I 

+  4 

308 

— 

1884 

10-92 

10-92 

0 

I5‘I 

15-0 

+  1 

0 

— 

1885 

10-30 

10-56 

—  26 

14-5 

14-7 

—  2 

52 

— 

1886 

9-63 

10-25 

—  62 

14*2 

14-5 

-3 

186 

— 

1887 

9-90 

10-37 

-47 

I4*4 

14-5 

-1 

47 

— 

1888 

10-51 

io-55 

-  4 

14-4 

14-7 

-3 

12 

— 

1889 

11-50 

10-93 

+  57 

15-0 

15-0 

0 

0 

— 

1890 

11-22 

11-17 

+  5 

15-5 

15-2 

+  3 

15 

— 

1891 

11-52 

ri*i7 

+  35 

15-6 

15-2 

+  4 

160 

— 

1892 

11*12 

10-97 

+  15 

15-4 

15-2 

+  2 

3° 

— 

1893 

10-50 

10-85 

-35 

14-7 

i5-i 

-4 

140 

— 

1894 

10-50 

10-78 

-28 

15-0 

15-2 

_ 2 

56 

— 

1895 

io-6i 

io-8i 

—  20 

15-0 

15-3 

-3 

60 

— 

1896 

11-15 

11-03 

+  12 

15-7 

15-6 

+  1 

12 

— 

1897 

11-27 

* — 

— 

16-0 

— 

— - 

- — 

— 

189S 

11-64 

■ 

16-2 

— 

* — 

■  "  ~ 

— 

C  C*  2 
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ELEMENTS  OF  STATISTICS 


To  obtain  the  measurement  by  comparing  differences,  the 
table  of  which  the  first  lines  are  given  was  completed. 


Imports.  Marriage  Rate. 


X 

DX 

D2X 

Y 

DY 

D2Y 

DX.DY  D2X.D2Y 

1845 

330 

-15 

4-  6 

— 

172 

O 

— 

O 

— 

1846 

315 

+  21 

172 

-14 
+  1 

-14 

-84 

-30 

-294 

1847 

1848 

321 

291 

1  W 
-30 

-36 
+  91 

158 

159 

+  15 
+  2 

-540 
-f  182 

DX  D2X  DY  D2Y 

Average . 15-74  i  —*19  *04 

Standard  Deviation  .  .  57-13  80  5-3  6-78 

Sum  of  DX  .  DY  =  8902.  Sum  of  D2X  .  D2Y  =  12076. 

Hence  r  from  first  differences  is  -6o  and  from  second  differences  -45. 

Example  7.- — In  the  experiment  described  on  pp.  304-6, 
1,000  sums  were  formed  each  of  the  number  of  letters  in 
10  words.  Write  A  for  the  sum  of  the  letters  in  the  first 
5  words,  B  for  the  sum  of  the  second  5,  so  that  a  =  A  -f-  B. 
After  each  10  a  further  5  words  were  taken,  for  the  sum  of 
whose  letters  we  write  C  ;  then  y  was  taken  as  B  +  C.  We 
have  thus  1,000  pairs,  for  which  the  correlation  coefficient 
should  be  f,  with  standard  deviation  -024. 

Actually  the  correlation  coefficient  was  -553,  more  than 
twice  the  standard  deviation  from  the  fraction  expected.  A 
possible  explanation  of  this  is  in  the  want  of  complete  inde¬ 
pendence  discussed  on  p.  306.  The  coefficients  for  four 
separate  groups  of  250  (for  which  the  standard  deviation  is 
•047)  were  -56,  -50,  -58,  *59. 

The  regression  is  nearly  rectilinear  in  the  central  region 
from  x  —  40  to  x  —  61  ;  outside  these  numbers  there  are  less 
than  20  cases  to  one  value  of  a,  and  the  standard  deviation 
of  the  average  of  an  array  is  greater  than  2,  so  that  a  comparison 
is  not  worth  while.  The  standard  deviations  of  the  values  of 
fs  included  are  from  1*3  to  2-0. 


EXAMPLES  OF  CORRELATION 


389 


Value 

Average  value  of 

Interquartile 

Je  deduced 

of  X. 

correspond  ing  y  s. 

range  of  then's. 

from  equation 

40 

48*3 

IO! 

45-3 

41 

44*8 

IO 

45-8 

42 

49-1 

II 

46-4 

43 

47*4 

II 

46-9 

44 

47*1 

9 

47*5 

45 

46-4 

9  h 

48-0 

46 

48*9 

12 

48-5 

47 

46*4 

IO 

49-1 

48 

50-2 

II 

49-6 

49 

51*1 

12 

50-2 

50 

5i-5 

Il£ 

507 

5J 

49-6 

7 

51-3 

52 

53-6 

13 

51*8 

53 

53*6 

16 

52-3 

54 

51*1 

IO 

52*9 

55 

51*9 

6 

53*4 

56 

53’4 

14 

54-o 

57 

52*7 

12 

54’5 

58 

52-7 

11 

55-i 

59 

60-2 

8 

55*6 

60 

56 

11 

56-1 

61 

57-2 

11 

567 

The  interquartile  range  as  calculated  from  theory  (-67  of 
2<j\/ 1  —  r2),  formulae  (26)  and  (107)  is  10*5,  to  which  the 
observed  ranges  approximate,  their  average  being  10-75.  The 
range  appears  to  be  independent  of  the  value  of  x,  as  was  to  be 
expected  from  the  theory  (formula  (107)). 

The  correspondence  of  these  numbers  is  evident  from  the 
diagram,  where  the  equation  of  the  line  of  regression  is 


y  -  51*50 


•553 


a;  —  51*46 


9-24 


9-43 


39° 


ELEMENTS  OF  STATISTICS 


Number  of  letters  experiment. 


Averages  of  arrays  and  line  of  regression. 


• 

The.  broken 
below  x=4o  a 
The 

nr\  ... 

77?  e  straight 

y-s/.  so 

$. 244- 

in  e  shows  the 
nd above  x-6 
xx  show  the 

line  has  for  it 

4  X-5/.4 

—  » . 553  - 

3  9- 43 
averages  oft / 
?  where  there 
quarti/es  in  e 

s  equation 

'-55 

\ 

0 

?e  arrays;  it  is 
are  less  than  a 
ach  array. 
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We  obtain  a  better  numerical  view  of  the  regression  if  we 
group  the  numbers  in  wider  grades,  in  which  the  error  of 
sampling  is  diminished,  thus  : — 


Grade 

Number  of 

Average  of 

Average  of  cor¬ 

x  from 

of  y. 

cases. 

y- 

responding  Vs. 

equation. 

3°~39 

85 

36-3 

43-8 

42-9 

4°-49 

348 

44-7 

47-7 

47-6 

50-59 

360 

54-3 

52-7 

52-0 

60-69 

173 

63-0 

57*7 

58-0 

70-79 

30 

72-7 

64-1 

63-4 

Here  the  regression  of  a  on  y  is  taken  ;  in  the  diagram 
the  regression  is  that  of  y  on  a. 

There  are  two  examples  below  30  and  two  above  80. 


There  are  various  methods  of  comparing  the  distributions 
of  observations  with  that  given  by  the  normal  correlation 
surface,  of  which  the  simplest  is  as  follows. 

Take  r  —  \  as  given  a  priori  and  a  —  9*32,  the  mean  of 
the  standard  deviations  of  a  and  y. 

The  equation  of  the  surface  is 

27T(T2Vl  —  'y2C 


Write 


X- Y  = X+ Y 
V2  ^  V2 


X2  _  Y_2 

The  equation  becomes  z  =  ^2^=»  e  a  .  e  a  ,  and  represents 

the  surface  referred  to  its  principal  axes,  inclined  at  450  to  the 
original  axes. 

The  volume  standing  on  the  area  bounded  by  X  =  X1( 
X  =  X2,  Y  =  Ylf  Y  =  Y2  is 


V27 r.ffVl  * 


“Xc 


Y2 


rx. 


V 27T  .  cr\/b  - 


e  '■WVy, 


and  can  be  obtained  at  once  from  the  table  on  p.  271. 
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Mark  distances  on  the  axis  of  X  as  in  the  figure,  OM^ 
MxM2  .  .  .  each  equal  to  cr/V2.  Write  av  a2  for  the  standard 

deviations  of  X  and  Y.  a1  =  <ry~,  a2  =<7/  V2. 

Then  OM1=  M1M2=  .  .  .  ==  04/ V3  =  m577^i- 

The  proportional  volumes  of  the  solid  bounded  by  vertical 
planes  perpendicular  to  the  axis  of  X  are  then  F  (-5 77) 
across  OMlf  F  (1-155)  —  F  (*577)  across  M1M2,  etc.,  which  can 
be  found  from  the  table  on  p.  271,  as  -2180,  -1580,  etc. 

Now  mark  distances  ONx,  N-^  ...  on  the  axis  of  Y  each 
equal  to  cr/V 2,  that  is  to  a2. 

The  proportional  volumes  bounded  by  vertical  planes 
perpendicular  to  the  axis  of  Y  are  F(i)  across  ONt,  F(2)  ~  F(i) 
across  ON2,  etc.,  viz.  -3413,  *1359,  etc. 

Since  in  the  equation  the  integrals  of  X  and  Y  are  iude- 


EXAMPLES  OF  CORRELATION 


393 


pendent,  all  sections  perpendicular  to  the  axis  of  X  are  cut 
in  the  same  proportions  by  the  planes  through  Nx,  N2>  etc. 

Hence  we  have  the  following  table  which  shows  the  pro¬ 
portions  of  the  volume  of  the  normal  surface  standing  on  the 
squares  MjKjI^Mg,  Mgl^LgMg  ...  in  the  first  line, 

on  NjNjPjKj,  KjP^gL,  .  .  .  in  the  second  line,  etc. 


Distribution  on  Squares  of  Normal  Frequency  Surface. 


j  (T  ^  * 

F(X/o-1)  (differences)  . 

'577 

•2180 

I'I55 

•1580 

1-732  • 

•0824 

2-31 

•03II 

2-89 

•0085 

3-46 

•OOI7 

Y  /  (7  2 

F(Y/<t2) 

Differences. 

Products  of  Differences. 

I . 

•3413 

•0744 

•0539 

•0282 

•0106 

•0028 

•0006 

2 

•1359 

•0296 

•0215 

•0112 

•OO42 

•0012 

•0002 

3 

•0214 

•0047 

•OO34 

•OOI7 

•OOO7 

•0002 

•OOOO 

4 

•OOI4 

•0003 

•0002 

•OOOI 

•OOOI 

•OOOO 

•OOOO 

The  distribution  is  the  same  in  each  of  the  four  quadrants 
formed  by  the  axes  OX  and  OY. 

The  decimals  in  this  table  x  1000  give  the  theoretic 
distribution  of  the  1,000  pairs  of  numbers  if  we  neglect  the 
skewness. 

The  observations  were  marked  in  on  squared  paper  and 
the  number  occurring  in  each  of  the  X,Y  squares  was  counted. 

The  results  are  shown  in  the  table  on  p.  394.  The  first  line 
in  each  row  repeats  the  theoretic  numbers  first  given,  the  third 
gives  the  observations. 

The  agreement  is  close  within  the  three  squares  to  right 
and  left,  and  two  squares  above  and  below  the  centre,  that  is 
within  ±  1-704  and  ±  2 04.  The  probability  of  so  much 
divergence  in  a  random  selection  is  approximately  f  (p.  433). 

To  the  left  of  these  squares  there  is  a  falling-off  in  the 
observations  (31  observations  against  41  expected)  and  to  the 
right  an  excess  (54  observations  against  41  expected).  There 
is,  however,  a  slight  heaping  up  in  the  12  squares  to  the  left 
of  the  centre  and  a  corresponding  deficit  to  the  right.  These 
are  exactly  the  phenomena  we  should  expect  from  the  skew¬ 
ness  of  the  original  curve  (p.  304).  The  effect  of  the  skewness 
is  worked  out  in  the  note  at  the  end  of  this  chapter,  and  the 
results  of  the  corrections  are  given  in  the  second  line  of  each 
row  in  the  following  table.  The  improvement  is  marked. 
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For  example  the  expectation  in  the  last 'three  columns  to 
the  left  is  now  33 (31  observations)  and  in  the  last  three 
columns  to  the  right  is  about  49  (54  observations). 


1,000  Pairs  of  Totals  of  Letters. 

Distribution  of  observations  compared  with  normal  and 
with  skew  frequency. 

The  central  horizontal  and  vertical  lines  are  not  the  axes 
of  co-ordinates,  but  are  the  axes  of  symmetry,  which  are 
inclined  at  45  °. 


First  lines.  Normal  distribution  .  .  (thus  29-6) 

Second  lines.  Second  approximation  .  (thus  28-7) 
Third  lines.  Observations  ....  (thus  35) 


0 

0 

•I 

•I 

•2 

•3 

•3 

•2 

•1 

•1 

0 

0 

0 

0 

p 

p 

0 

■2 

•4 

-4 

? 

? 

0 

0 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

•2 

•7 

i-7 

3’4 

4-7 

4-7 

3-4 

i-7 

•7 

•2 

0 

0 

p 

0 

•2 

19 

4:3 

5-1 

4-9 

3-2 

1-7 

p 

0 

0 

0 

0 

1 

1 

1 

5 

6 

4 

0 

0 

0 

•2 

1-2 

4*2 

11*2 

21-5 

29-6 

29-6 

21-5 

11*2 

4-2 

1-2 

•2 

p 

■2 

2-8 

10  8 

22-3 

30-5 

28-7 

20-7 

11-6 

5-6 

2-2 

p 

0 

0 

5 

11 

20 

33 

35 

20 

11 

5 

0 

0 

•6 

2-8 

io-6 

28-2 

53-9 

74*4 

74-4 

53-9 

28-2 

io-6 

2*8 

•6 

0 

2-7 

11-1 

33  3 

63-0 

79-2 

69-6 

44-8 

23-1 

10-1 

2-9 

1-6 

0 

2 

12 

34 

47 

77 

72 

61 

24 

8 

5 

1 

•6 

2-8 

io-6 

,28-2 

53-9 

74’4 

74'4 

53-9 

28*2 

io-6 

2-8 

•6 

0 

2-7 

11-1 

33 3 

63  0 

79-2 

69-6 

44-8 

23-1 

10-1 

2-9 

1-6 

0 

1 

9 

36 

75 

76 

64 

46 

26 

15 

5 

4 

•2 

1-2 

4-2 

11*2 

21-5 

29*6 

29-6 

21-5 

1 1  •  2 

4-2 

1*2 

•2 

? 

■2 

2-8 

10-8 

22-3 

30-5 

28-7 

20-7 

11-6 

5-6 

2-2 

? 

0 

1 

1 

8 

21 

32 

26 

18 

9 

8 

0 

2 

0 

•2 

'7 

1-7 

3*4 

4*7 

4’7 

3-4 

1-7 

•7 

•  2 

0 

0 

P 

0 

• 2 

1-9 

4  3 

5-1 

4-9 

3-2 

1-7 

p 

— 

0 

0 

0 

0 

3 

2 

2 

0 

3 

1 

0 

0 

0 

0 

'I 

•1 

•2 

•3 

•3 

•2 

•1 

•1 

0 

0 

0 

0 

p 

p 

0 

■2 

• 4 

-4 

p 

p 

0 

0 

0 

0 

0  1 

0 

0 

0 

2 

1 

0 

0 

0 

0 

The  probability  of  the  divergence  from  expectation  as  a 
whole  has  been  tested  (see  p.  433),  and  is  approximately  f ;  that 
is,  in  only  two  such  experiments  out  of  five  should  we  expect 
so  close  an  agreement.*  On  the  other  hand,  it  is  highly 
improbable  that  we  should  get  so  great  divergence  on  the  left 
and  right  if  the  distribution  had  been  normal  (and  symmetrical)  ; 

*  The  thick  lines  in  the  table  are  only  to  mark  the  regions  to  which  the 
test  of  p.  431  is  applied. 
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the  second  approximation  is  necessary  for  the  completion  of 
the  theory. 

A  simple  method  of  testing  the  agreement  of  the  distribu¬ 
tion  of  observations  with  that  given  by  the  normal  surface 
may  be  obtained  by  studying  the  distribution  of  the  v-arrays, 
instead  of  transferring  as  on  p.  391  to  axes  of  symmetry. 


where  a-'  —  <t2V (i  —  r2),  y'  —y - i.e.,  y'  is  measured  parallel 

°T 

to  the  axis  of  y  from  the  line  of  regression,  as  in  formula  (107). 
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F(*> 

O— I  1-2  2-3 

•34I  *136  *022 


F(A 

Products. 

0-1 

•341 

•Il6 

•O47 

•OO7 

1-2 

•136 

•O47 

•Ol8 

•003 

2-3 

•021 

•OO7 

•003 

•OOI 

The  division  when  the  standard  deviations  are  taken  as 
units  is  shown  in  the  table  and  diagram. 

The  results  of  the  words  experiment  tabulated  on  this 
basis  are  : — 

—  3  <r  — 2  a  — <r  O  a  2<r  3<r 


0 

1 

0 

l 

0 

1 

0 

0  1 

0 

1 

0 

1 

0 

0 

1 

0 

l 

0 

1 

0 

0  l 

0 

l 

0 

1 

0 

3ff/  - 

0 

1 

1 

I 

l 

l 

3 

l 

1 

7 

—  1 
7  l 

3 

l 

l 

1 

1 

1 

0 

0 

1 

0 

1 

0 

l 

4 

8, 

2 

1 

0 

1 

0 

20-'  - 

0 

1 

1 

3 

1 

1 

18 

1 

1 

47 

l 

47  , 

18 

1 

1 

3 

1 

1 

0 

0 

1 

1 

I 

17 

I 

63 

56  | 

18 

| 

4 

I 

1 

a  — 

I 

1 

7 

l 

47 

l 

116 

Il6  1 

47 

1 

7 

1 

I 

O  _ 

0 

1 

1 

1 

57 

1 

131 

114  1 

42 

l 

6 

1 

0 

f  Line  of 

I 

1 

7 

1 

47 

1 

116 

116  1 

47 

1 

7 

1 

I 

\  regression. 

0 

1 

7 

1 

49 

I 

93 

103  1  48 

1 

9 

1 

1 

—  <r  — 

0 

1 

1 

3 

1 

1 

18 

l 

l 

47 

- l 

47  1 

18 

l 

l 

3 

1 

1 

0 

1 

1 

2 

l 

13 

1 

47 

50  | 

16 

l 

0 

1 

0 

—  2  o'  - 

0 

1 

1 

1 

1 

1 

3 

1 

1 

7 

l 

7  ! 

3 

1 

I 

1 

1 

1 

0 

0 

2 

1 

3 

10 

7 

3 

0 

0 

-3<r'  ~ 

0 

1 

1 

0 

l 

0 

1 

1 

0 

1 

0  1 

0 

1 

l 

0 

I 

1 

0 

0 

1 

0 

1 

1 

l 

0 

3  1 

1 

l 

0 

1 

1 

The  vertical  columns  show  the  ^-arrays,  and  are  comparable 
with  the  more  detailed  setting  out  on  p.  389.  In  each  com¬ 
partment  the  calculated  values  are  written  above  the  number 
of  observations. 

The  effect  of  correcting  for  skewness  would  be  to  improve 
the  correspondence  in  the  same  directions  as  in  the  former 
tabulation. 

Note  on  the  Second  Approximation  to  the  Correlation  Surface. 

When  terms  of  the  order  -dL  are  retained  in  the  general  law  of  great 

V  n 

numbers,  a  term  involving  the  mean  cube  of  error  appears  in  the  equation 


EXAMPLES  OF  CORRELATION  397 


(p.  298).  Similarly  Prof.  Edgeworth  shows*  that  the  equation  of  the 
correlation  surface  should  under  similar  conditions  be  written 


* = *0  —  i  U 


,  .  83 

-30^3*0 +3*2^2^ 


^0  +  3^12  0^2^0  +  k0Zfiy3Z0 


(113) 


where 


2(1  -r2)\ 


(  x^+y 


x  and  y  are  the  differences  between  the  observations  and  their  averages 
divided  by  their  standard  deviations. 


A30  =  meanA'3,  &21  =  mean  x3y,  A12  =  mean  xy3,  A03  =  mean y3. 

In  the  example  on  pp.  388  seq.  we  have  x<r  —  A  -f-  B,  ya  =  B  -f-  C,  where 
<r  =  9*44  approx. 

Mean  xya2  —  mean  B2  =  ia2,  and  *•  =  £.  (In  the  experiment  r  was  found 
to  be  -55.) 

Mean  x3<r3  =  kao  =  k03  =  k  (as  on  p.  251)  =  -409. 

Mean  x2y<r3  =  &21  =  mean  B3  =  \  mean  (A  +  B)3  =  he  —  k12. 

When  the  differentiations  are  performed  with  these  values,  we  obtain 

z  =  — ^  e~'^x2^V“~x^  It  _  .01  (x -\-y) (18  — {—  1  ixy — 8# 2 — 8y 2)  [  =  z0  (1  —w)  say. 
7TV3  l  J 

The  expression  z0w  is  not  readily  integrable,  and  the  simpler  method  of 
procedure  is  to  integrate  z0  over  suitable  areas,  and  to  correct  the  results 
approximately.  An  application  of  the  method  that  leads  to  Simpson’s  rule 
shows  that  if 

ZO,  Z(\t  —  Zfr'  0>  Z —  h,  ()} 


are  the  ordinates  of  the  four  corners  of  a  surface  standing  on  a  rectangular 
base,  whose  diagonals  are  2 h,  2k,  and  z00  is  the  ordinate  at  the  centre  of  the 
rectangle,  then  the  mean  ordinate  is 

a  (2^oo  +  Z0Jc  -f*  Z0_k  +  ZM  +  Z_ 7i0) 

Hence  if  z0w  is  calculated  for  the  four  corners  and  centre  of  each  of  the 
volumes  tabulated  on  p.  393,  we  should  reduce  each  of  the  volumes  by  a 
quantity  z0w',  where  w'  is  the  average  of  twice  its  central  value  and  the  four 
angular  values.  This  has  been  done  throughout,  and  the  values  obtained 
added  to  or  subtracted  from  the  numbers  given  by  the  normal  curve  to  obtain 
the  corrected  values  on  p.  394. 

Notice  that  when  the  surface  is  referred  to  its  principal  axes  by  writing 
x  -\ -  y  —  V 2  X,  —  x  y  —  V 2  Y,  w  becomes  symmetrical  in  Y,  but  not 
in  X. 


*  Law  of  Error  ( Camb .  Phil.  Trans.,  Vol.  XX,  1905,  Part  II,  §  6),  and 
Statistical  Journal,  1917,  pp.  268  seq.  The  standard  deviation  is  used  as 
unit  in  the  text  instead  of  the  modulus  V 2. a  used  by  Edgeworth. 
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PARTIAL  AND  MULTIPLE  CORRELATION . 

The  investigation  of  Chapter  VII  has  shown  how  the 
variations  in  one  quantity  are  related  to  the  variations  in 
another  by  which  it  is  influenced.  It  frequently  happens, 
however,  that  the  movements  of  a  variable  are  related  to  the 
movements  of  a  number  of  others.  The  frequency  distribu¬ 
tion  can  then  no  longer  be  represented  by  a  surface  in  three 
dimensions,  but  an  analogous  function  is  obtained  of  which 
the  form  already  given  is  a  simple  case. 

The  regression  equation  is  no  longer  that  of  a  line  or 
curve,  but  an  expression  connecting  one  selected  variable 
with  a  number  of  others ;  we  can  then  isolate  the  effect  of 
any  one  of  the  remaining  variables  (by  a  method  involving 
and  similar  to  that  of  partial  differentiation),  and  so  obtain 
the  relation  between  any  pair  of  variables,  abstraction  being 
made  of  the  remainder.  This  is  the  very  important  method 
of  partial  correlation. 

In  the  sequel  the  case  of  three  variables  is  handled  in 
detail,  and  the  more  general  solution  is  summarized. 

Let  there  be  three  variables,  which  measured  from  their 
means  are  y,  £,  and  let  them  be  correlated  each  with  each. 

Suppose  that  they  are  so  connected  that  z  =  ax  +  by  -f-  c 
is  an  ideal  plane  giving  the  mean  value  of  2  corresponding  to 
a  pair  of  values  #,  y. 

Required  to  find  a,  b,  c  so  that  the  observed  deviations  of 
observed  values  of  z  from  the  values  given  by  this  equation 
have  the  least  improbability. 

Let  zs  be  the  average  of  ks  observations,  each  of  which  has 
for  its  .v,  y  members  xs  (to  x8  +  Sx)  and  ys  (to  ys  -f  By). 

Write  ?;6.  for  zs  —  (axs  -f  bys  +  c),  i.e.  the  deviation  of  the 
mean  of  the  observations  in  the  sth  group  from  its  ideal  value. 

39s 


PARTIAL  AND  MULTIPLE  CORRELATION 


399 


Let  ax>  < tv,  az  be  the  standard  deviations  of  the  frequency 
groups  of  x,  y,  z. 

Then,  if  in  the  long  run  the  standard  deviation  of  a  group 
in  z  is  independent  of  the  values  of  a  and  y,  the  standard 


deviation  of  ys  is  (formula  (38)),  and' the  probability  of  the 


Vks 


kt-ris2 


occurrence  of  ys  (to  77*'+  By)  is  Ke  2 <n2  .  By. 

Let  there  be  n  pairs  of  values  such  as  xs,  ys  and  N  obser¬ 
vations  in  all,  so  that  N  =  kx  +  .  .  .  +  ks  +  .  .  .  +  kn. 

The  probability  of  the  concurrence  of  y^  .  .  .  ys  .  .  .  yn  is 

•  </> 


Ce  2^2 '  r  where  0  =  k^2  + . . .  +hys2  + .  .  .  +  knyn2,  and  C  is 
constant. 


n 


The  probability  is  greatest  when  </>*=S  ks[zs  —  (axs -f- by  s -f-  c)]2 

1 

is  least,  and  a,  b,  and  c  must  be  chosen  to  give  this  result. 

0  =  S  (kszs2)  +  <32S(&sas2)  +  b2S(ksys2)  +  c2Sks  —  2a  S  (ksxszs) 

—  2bS(ksyszs)  —  2cS(kszs)  +  2abS(ksxsys)  +  2acSxs  +  2bcSys. 

Here  S xs  =  o  =  S y8.  Skszs  =  sum  of  all  values  of  z  =  o. 

S ksXs2  =  No-*2,  xs  being  repeated  ks  times  in  the  whole  group, 
and  S ksys2  =  N ay2. 

S ksXsSs  =  S xz,  since  kszs  =  sum  of  values  of  z  in  the  sth  group, 
and  S ksyszs  =  S yz.  Also  S ksxsys  —  S xy. 

0  =  S{kszs2)+Na2o-x2+m2o-y2-2aSxz-2bSyz+2abSxy+Nc2. 
Write  Syy  —  ISI" Yxya xa y>  *Bxz  —  In  Yxzaxaz>  S yz  —  In' y yza ya z • 

8  0  80  80 


Then  0  is  a  minimum,  when 
when 

80 
0c 
00 
da 


o 


o 


da  ’  db  dc 

—  2Nc  and  c  —  o, 

—  2N  (aa x~  T  baxayYxy  axazYXz)  =  O, 


are  each  zero,*  i.e., 


and 


00 


Hence 


and 


O  =  —  =  2N  (aaxayrxy  +  bay 2  —  ayazryz)  =  O. 

Ciax  +  bayrxy  =  azrxz 
a  a  xy Xy  +  bay  =  azYyz. 

(lax  bay 


a  y 


rxz 


r%yryz 


'yz 


Yxz^xy 


I  —  Yx, 


(114) 


xy 


*  The  values  of  a,  b  and  c  can  be  obtained  without  differentiation  by 
expressing  $  as  the  sum  of  squares. 
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The  equation  z  —  ax  4-  by  +  c  becomes 


1  =  Rj. .  --  +  R„  . 

<rz  crx 


y_ 


(115) 


where 


f'xz  YocyYyz  -p  _ Yyz  YxyYxz 

-r  Z  2  ’  •'/  T  _  Y  2  • 

1  —  >xy  1  '  xy 


Rr  and  R?/  are  called  partial  regression  coefficients  between  z,  x 

CT"  • 

and  z,  y;  for  a  given  y,  z  =  R^  —  x  +  const.,  and  for  a  given  x 

O'  x 

z  =  Ry  — 5  .  y  const.,  formulae  which  may  be  compared  with 

1  o> 

y  =  r  —  .  x  given  above  (p.  362). 

O'  x 

Similar  equations  are  of  course  to  be  obtained  when  x  or  y 
are  expressed  in  terms  of  y,  z  or  a,  z. 

The  partial  correlation  coefficient  between  x  and  z  (y  constant) 
is  defined,  by  analogy  with  the  case  of  only  two  variables,  as  the 
geometric  mean  of  the  partial  regression  coefficients  found  respec¬ 
tively  when  z  is  expressed  in  terms  of  x  and  y  as  in  (115)  and  when 
x  is  expressed  in  terms  of  z  and  y ;  it  is  therefore 

1'xz  I’xy  I’yz 

Vi  —  rXy2  Vi  —  ryz2 

The  foregoing  analysis  is  based  on  Mr.  Yule’s  paper 
(. Statistical  Journal ,  1897,  pp.  831  seq.)  and  book,  and  to  him 
is  due  a  great  part  of  the  work  on  this  subject.  The  treatment 
here  differs  from  his  only  by  the  important  consideration  that 
it  is  based  on  the  prevalence  of  the  law  of  error  as  discussed 
above  (p.  298),  and  that  it  makes  the  assumption  that  the 
standard  deviation  of  z  is  independent  of  the  values  of  a  and  y, 
which  is  by  no  means  universal ;  while  Mr.  Yule  does  not  need 
this  assumption,  but  uses  the  method  of  least  squares,  a  method 
which  is  not  used  (except  very  rarely)  in  this  book,  because  of 
the  difficulties  that  underlie  the  principles  involved. 

The  equation  between  z  and  a  and  y  is  the  same  as  Mr. 
Yule’s,  and  also  the  same  as  obtained  (see  p.  405  below)  from 
the  theory  of  normal  multiple  correlation. 


Example  1. — The  Cost  of  Living  Committee,  1918,  collected 
a  number  of  budgets  of  the  weekly  expenditure  on  food  in 
working-class  families  (see  p.  310)  ;  390  of  these,  obtained  from 
families  of  the  skilled  classes,  were  grouped  according  to  the 
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numbers  of  persons  in  the  households  above  and  below  14  years 
of  age  (see  Statistical  Journal,  1919,  p.  360). 

The  notation  and  quantities  involved  are  as  follows  : — 


Expenditure  on  j 
food. 

Number  over 

14  years. 

Number  under 

14  years. 

Average  .  .  . 

Difference  from  average 
Standard  deviation 

5IS. 

*  X  5'. 
<Tz—3'°3  X  5s- 

2-48 

X 

^=•836 

3*56 

y 

vy  =  1-40 

yxy  = 

•0525,  ^*  =  -504, 

rzy  =  ,3I5- 

The  equation  obtained  is  z\<rz  —  *52 x/<rx  +  *35y/o-?/,  which 
leads  to  the  formula  : — 

Expenditure  (shillings)  =14*5 +  9-4  x  number  over  14  years 
+  3*7  X  number  of  children  under  14  years, 

and  to  the  following  table  : — 


Family  Expenditure  on  Food.*  (Shillings.) 


Number 
of  persons 
over 

14  years. 

By  Formula. 

Average  of  actual  cases. 

Number  of  children. 

Number  of  children. 

2 

0 

4 

5 

2 

1 

3  !  4 

5 

2  .  .  . 

3  •  •  • 

4  • 

40-7 

5°’i 

59-5 

44*4 

53*8 

63*2 

48*1 

57-5 

66*9 

51-8 

6i-2 

70-6 

4°'5  (74^ 
54-8  (21) 
58-0  (10) 

45-2  (74) 
51-2  (17) 
60-2  (10) 

47' 1  (53) 
58-2  (16) 
78-1  (6) 

52-9  (25) 
64-9  (17) 

-  (0) 

The  numbers  in  brackets  are  the  numbers  of  actual  cases  averaged. 


The  agreement  between  experience  and  formula  is  as  close 
as  can  be  expected,  when  the  considerable  standard  deviation 
and  the  small  numbers  of  cases  are  considered. 

From  this  example  it  becomes  clear  that  the  method  of 
partial  correlation  is  closely  akin  to  the  ordinary  way  of 
analysis  in  cross-tabulation  ;  but  the  use  of  the  formula  brings 
the  separate  results  into  coherent  relation.  Here  we  have 
the  result  that  on  the  average  an  additional  adult  (who 
generally  increases  family  income)  adds  9s.  5^.  to  the  family 


*  For  the  working  out  of  these  figures  and  those  on  p.  310  I  am  indebted 
to  Miss  King  and  Miss  Mackenzie  at  the  School  of  Economics. 
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food  expenditure,  while  an  additional  child  adds  only  3s.  Sd.  ; 
the  greater  number  of  children  the  lower  is  the  standard  of 
living,  since  a  child’s  nourishment  costs  about  two-thirds  of 
that  of  an  adult.  (Here  “  adult  ”  is  used  for  a  person  over 
14  years.) 

Example  2. — The  following  data  are  obtained  for  the  County 
of  London  from  the  Census  of  1911.* 

z  -f*  3*7  is  the  number  of  rooms  to  a  tenement. 

x  T  4*15  , ,  ,,  persons  in  a  family. 

y  +  *86  ,,  ,,  children  under  10  in  a  family. 

The  averages  for  the  county  are  3-7,  4*15  and  -86  respectively. 

^xy  —  *57>  ^ XZ  ~  44>  L/2  =  *03,  (Tz  —  2-59,  <TX  —  2*32,  (Ty  —  I*24j 

Rx  =  *676,  R?/  ==  *402. 

The  figures  relate  to  1,023,951  families,  sufficient  for 
accuracy  to  three  significant  figures. 

z  =  x  x  —  x  *676  —  y  x  —  X  -402  =  x  X  754  —  y  x  -840 

°\r  ~  ay 

or  (rooms  —  3*7)  =  754  (persons  —  4*15)  —  -84  (children  —  -86). 

The  number  of  rooms  for  families  of  given  size  decreases 
rapidly  as  the  number  of  children  increases. 

We  may  also  write  : — 

Number  of  rooms  =  1-29  +  -68  persons  —-84  children 

=  1*29  -f  -68  persons  over  10 
—  •16  children  under  10. 

Example  3. — In  the  research  on  the  social  conditions  in 
Reading  described  in  Livelihood  and  Poverty  the  income, 
rent,  and  family  constitution  were  tabulated  for  586  families. 
Rent  increases  with  income  and  with  the  number  of  earners, 
but  for  the  same  income  and  the  same  number  of  persons  it 
may  be  that  the  more  numerous  the  children  the  less  can  be 
afforded  for  rent. 

Rent :  z  +  6*075  shillings,  where  6*075  shillings  is  the 
average. 

Number  of  equivalent  persons :  x  +  3-287,  when  3*287  is 
the  average. 


* 


I  am  indebted  to  Mr.  J.  W.  Nixon  for  this  calculation. 
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Income:  y  +31712  shillings,  where  31-712  shillings  is  the 
average. 

The  number  of  “  equivalent  "  persons  was  obtained  b}' 
classifying  adults  and  children  on  a  somewhat  arbitrary  scale 
according  to  the  house-room  they  may  be  presumed  to  need  ; 
children  under  5  years  were  counted  as  J,  children  from  5  to  14 
as  J,  boys  of  14  to  18  and  girls  from  14  to  16  as  §,  and  older 
persons  as  1. 

The  correlation  between  rent  and  number  of  rooms  is 
close,  so  that  rent  may  be  taken  as  measuring  house-room. 

az  =  i*33>  o’*  =  1-22,  <jy  =  13*0,  rxy  =  *543,  rxz=- 152,  ^=-458, 
R.v  =  — 136,  R?/  =  *532. 

TT  z  r  x  y 

Hence  -  =—-130 - b  *532—, 

&  z  °x  Gy 

or  z  —  —  -148^  +  -05447. 

House-room  then  decreases  perceptibly  as  the  size  of  the 
family  increases,  for  given  incomes. 

Each  of  the  three  examples  shows  that  families  with 
children  tend  to  secure  relatively  less  food  and  less  house- 
room  per  head  than  families  without  children,  and  to  some 
extent  measures  the  loss. 


We  have  still  to  consider  the  theoretic  distribution  in  three 
dimensions  of  three  variables,  corresponding  to  the  normal 
correlation  surface  for  two  variables.  The  following  pages 
show  the  results  and  the  analysis  in  simple  cases.  It  will  be 
observed  that  the  same  lines  of  proof  are  followed  as  in  the 
case  of  two  variables. 

Multiple  Correlation  Surface. 

The  following  analysis  is  only  valid  on  the  assumption 
that  the  elements  have  normal  frequency. 

Let  X,  Y,  Z  be  three  quantities  which  depend  on  other  quan¬ 
tities  U,  Vi,  V2,  V3  in  such  a  way  that  X  =  U  -f-  Y  =  U  -{-  V2, 
Z  =  U  +  V3. 

Let  U,  V1;  V2,  V3  be  chosen  at  random  from  normal  frequency 
groups  whose  averages  are  u,  vx  .  .  .  and  standard  deviations 

&Uy  &Vx  *  ’  * 
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Let  X,  Y,  Z  be  the  averages  and  <jx>  &y,  <*z  the  standard  devia¬ 
tions  of  X,  Y,  Z,  and  let  X  =  X  -f  x  .  .  .  U  =  U  +  u  .  .  . 

Then  in  the  long  run  X  =  U  +  V  etc.  and 

x  —  uJrv1,  y  =  u  -f  v2,  z  =  u  -\~  v3. 

Suppose  u,  vv  v2>  v3  quite  independent  of  each  other,  so  that 
mean  uv1  —  o  =  mean  v±vz  etc. 

Write  rXIJ  for  the  coefficient  of  correlation  between  x  andy. 

Then  <xa;2  =  o-«2  +  o-v  2  .  .  . , 


and  cr X(t yYxy  —  mean  (u  -f  zq)  (u  +  v2)  =  cr,,2 

—  VyVzYyz  ~  &z(f%Yz%' 

The  joint  chance  of  selected  values  of  xv  yv  z±  arising  from 
particular  values  u,  vv  v2,  v3  is 


u2 


<r„ 


_  4 
2 


<Tu 


V  2 


X 


7 r 


cr 


V/27 r 


X 


X 


=  p«. 


subject  to  the  conditions  xt  =  u  +  vv  yx  =  u  +  v2>  zx  =  u  +  v3. 
Eliminate  vv  v2,  v3. 

The  joint  chance  of  xv  yv  z1  arising  from  a  particular  value  u 
is  given  by 


—  2  log  (PM  .  /\rr2cr ucr Vlcr vyr Va) 


u*  ( u  —  x i)2  ,  (u-yj2  ,  (U-  Zx)2 


cr  n‘ 


T 


<r 


2' 


T 


Vi 


or 


Vi. 


where 


a 


cr 


III 

+  ~2  + 


o"v3 

I 


a(u  —  b)2jrC, 


cr 


Vi 


cr  v 


2  i 


Vi 


c r 


Vs 


aft 


J-  +  S3,  + 


t’2 


o 

*  2 


-2+^4  + 

Vi  <TD2 


The  whole  chance  of  the  selected  values  x1  y\  z1  arising  from  any 
value  of  u  is  P  =  I  P«  du,  xx  y1  z1  being  regarded  as  constant 


(2ir)’<ru<rVl<rVt<rc3  J  -  V 2ir 

Write  x,  y,  z  for  xv  yv  zv 


(27r)  ^(Tu(T  Vi<t  t,2cr  Va  Va 


=l  e 


ac 


i  i  , 

+  ~o  +  •  *  • 


<Tu 


Vi 


x- 


(T 


+  •  •  • 


X 


a r 


+ 
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Now 


III 

“2  +  Z-2+- 


+ 


,r  2_  2  ,,  4 

(Ty  cr  z  (T  'H 


<?u“  '  Vv22  <jv2  CTU 2  1  cry2  —  (T,,2  1  cr?2  —  cr,,2  <tx2(tv2<tv2 

,  I,  I  1  I 

^  '  2  ~T  *  2  2  4“  2  2  'T  o  ~o 

(Ttr  o*  —  (TV2  (Ty 2  —  (TiT  cr?2  —  ow2 

o-x2CTy2CTy 2  —  cr,,4  ((Ta;2  +  cr?/2  +  0'22)  4*  2cr 


tt 


rr  2,r  2,,  2._  2 
CT((,  U  (;3 

.  C  {(rx2(Ty2(jy2  —  crw4  (<xx2  -f  O'?/2  +  crz2)  +  2(Tu6( 

=  X2  (cTj/V- <rtt4)  +  +  —  2%y<jx?(Tv2  — 

Write  RcreV/cr?2  for  crx2(Ty2(Tz2  —  cr,,4  (era;2  +  +)  +  2crlt6, 

SO  that  R  ==  I  ~f~  ^VxyYyzYzx  YXi2  r^2  Yzx2  - 

The  chance  of  the  concurrence  of  x,  y,  z  is  then 


P  _ _ 1 _  p  2R  \<tx 


-  "  J'^2)  +  +  ~  -~r  ~  rxtf'y*)~  * 


(2tt  f-  cr  a-cr  yd  ^R^ 


CTXCTv 


1  ■  (n6) 


for 


Yxy  YxzYyz  (Tx(TyY%y<Tz  O' x&z  •  Yxy  O’yO’z  •  ? yz 


cr  x<y  y 


rr  2„.  2  r  2 

(T  x  (Ty 


,r  2,_  2  ,r  2  _  2  ,r  2,r  2 

(r«  cr?  cru  .  cr,*  cr,,,  cr 


O  9  O 

O" a:  (Ty~(Tz 


In  the  special  case  where 

(Til  =  (TVl  =  CTV 2  =  o-fg,  cr.v2  =  2<rlt2  —  <x2,  say. 
Yxy  =  i  =  rVz  =  fza.  and  R  =  J. 

The  chance  is  then 

1  „  ~  di  { 3(x-2  +  y2  +  *2)  - 2  (w  +  yz  +  **) f 

» ^ 


2~.  2^  2 
cr.  c  cr-,y  cr^ 


2  cr37r - 

The  most  probable  value  of  z  for  given  values  of  x  and  y  is 

,  .  ,  ,  0P 

obtained  from  —  =  o,  and  is 

dz 

z  .  x  \  ,  y  , 

(l  Yxy  )  —  “  y  xz  YxyYyz)  H  y  yz  YXyYx», 


cr? 


(Tx 


<T 


y 


as  in  formula  (115). 

In  the  special  case  this  becomes  2  =  J  (x  -j-  y). 


It  is  shown  by  Elderton  (following  Pearson)  that  if  x,  y  and  z  are  the  sum 
of  any  finite  number  of  variables,  such  as  the  u ,  v,  .  .  .  above,  all  of  normal 
frequency,  and  some  common  to  a  pair  (x,  y)  ( y ,  z)  or  (z,  x)  and  others  occurring 

in  only  one  of  these,  then  P  is  of  the  form  Ke ~ (a5‘ + &;</2 +c- 2 + %fyzy>yz,r + ‘ihxy) 
where  a ,  b,  c,  f,  g,  h  are  constants  to  be  determined. 
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Take  the  aggregate  of  the  chances  to  be  unity. 

Let  A,  B,  C,  F,  G,  H  be  the  minors  of  the  determinant  A  = 

that  A  —  bc—f2,  F  =  Jig  —  af .  .  .  ,  BC  —  I72  =  a  A,  .  .  . 

p  /  n  g  \2  C(  f  y  ,  A  2 
Then  -  log  ^  =  a  [x  +  -y  + f *)  +-(y-  c*)  +  C* 

P .  dxdydz  =  (i/7r)3  •  K  .  y/ ^  y/  —  and  Kir-  =  A- 


<T 


8  — 


I  /(if  _  — g2  3  I  /  C\» 

Vz%dxdydz  —  Ktt  .  — —  .  y/  z2e  C  =  Kir2— 7=  •  i( — 


Vc 


B 


A 


<rJ  =  —  ,  <rx2  = 
v  2  A  2A 


Similarly 

<ry(TtryZ—  I  I  I  Pzy dxdydz  =  KV ira~$j  j  zye 


C /  P  \*  a 


lydz 


=  KVtt a~U  I  z(y' aV  e  c 


F 

C 


dy'dz, 


where  y'  =  y  —  -^z  and  the  limits  of  the  integration  are  ±  co 

_i  fa  F  /'  ~cz 2  j  F 

=  Ivtt  .  a  2  \/  c’c  2  6  dz  =  i~. 


,  H  ,  G 

Similarly  <rxoyYxy  =  G  and  ffxntrxz  =  J,  — . 

«a  =  BC-F2  =  4^W(i  -ry?)A* 

/A  =  GH  —  AF  =  4<rY<L<rz  (#V*2  —  >>)  A2 
A2  =  ABC  +  2FGH  -  AF2  -  BG2  -  CIP 

=  8aVYY(i  +  2  rxyYyirtx  —  ry?  —  rY  —  rxy2). 

Write  R  for  the  quantity  inside  the  bracket.  R  = 


I  l  Xy  t  xz 

y-xy  x  ryl 

Yxz  Yyi  I 


Then 


Hence  P  = 


a  = 


a  = 


i  -  lY 


.  / 


YxyYxz  “  Yyi 


8R <rx*<rS<r*’  2R<r,2  ’  J  2cy<Tji  * 


a/R(27 r)^<Tx(Ty(Tz 
as  obtained  in  the  special  case  above. 


,  2r  V>(I  ry*)+  +  a-  ,<r}Vyz  V«)  / 

x  y  z 


If  instead  of  x,  y,  z  there  are  n  quantities  x1  x2 . . .  xn  a  more  £ 
oroof  on  the  same  lines  (due  to  Prof.  Karl  Pearson)  leads  to 

.2 


VR  (271)“  .  (T1(Ti  .  .  .  £T„ 


-  Uii! 
:  2KV 


R11+  +  +2^R18+  +} 


<rl(T2 


,  SO 


jC 

2  A 


eneral 
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,  r,t  being  the  same  quantity  as  ru,  and 


being  the  minor  obtained  by  crossing  out  the  sth  column  and  row,  with 
its  appropriate  sign. 

The  following  note  indicates  the  course  of  the  proof. 

As  before  P  =  Ke-<t>,  where  <p  =  anx^  +  .  .  .+2anx1x2  +  .  .  an  .  .  a „  .  .  . 
being  constants. 


= 

I 

'ia 

^13 

•  •  • 

r21 

I 

^23 

•  •  •  ^2  »» 

• 

• 

•  • 

•  • 

V n\ 

^>12 

n,3 

. . . 

Let  A„  = 


«n 

al9  •  •  • 

a2\ 

^22  •  •  • 

•  •  • 

Ct  2)1 

an  1 

•  •  • 

#/i2  •  •  • 

O' nn 

,  where  ast  =  at„  and  let  A,t  be  a  minor 


obtained  by  crossing  out  the  sth  column  and  /th  row. 

A-tm  — 

Then  <p  may  be  expanded  so  that  xx  occurs  only  in  the  first  term,  x2  only 
in  the  first  two  terms,  etc.,  and  then 


—  an  \  xi  +  ~  xi  +  •  •  •  )  + 


a 


11 


a 


,  V,  wu  a™iXi + , .  v+ . , . + A»-i 


n 


‘ji— 2 


%n—  1 


A„  .  ,i_i 


J _ _  V  z 

i  .  ■*)!  > 

A;,-! 

as  may  be  shown  by  a  rather  troublesome  induction. 


1  = 


2 _ 


P  .  dx-,dx« . .  .  dxn  =  K7r2 .  ( —  ^ 

\ailJ  \Aa 


A)(_1\i  Kir5* 


A«  /  (A„)* 


.  .  .  P xn2dx1dx2  .  .  .  dxn  —  1  .  \ 


A  ,i_i  A,, 


2  A, 


Similarly,  by  changing  the  order,  o-fa  = 

O’ n  •  &n  —  1  •  n—i 

A „  \i 


Ht 


2A„ 

.  .  .  P XfiXiji — ^dx  1  •  •  .  dxn 


n- 1 


k/t—  2 


*»*n-i*  n~2 


rt_1\2  A)t 


«  —  1 


-H3 


»-i  /  '  «-i  dxn_xdxn  = 


A„,  n— 1 
2  Ara 


Similarly,  by  changing  the  order, 

<?sO‘trtt 


2AJt 


Substitute  these  values  for  >v  etc.  in  the  determinant  giving  R,  and  we 
obtain 


R 


(2An)n<r1a  .  .  .  ern2 


Ajj  Aj2  •  •  •  Aj,t 

A2j  A22  •  •  •  A2h 


• 


A A„o  •  •  •  A,] 


A  n_1 


(2 A „)'lo’12  .  .  .  (Th2’ 


by  a  well-known  theorem  in  determinants. 
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flii 

(2A„)n-1<T22  .  . 

well-known  theorem. 

Rn 

"•'“SE?' 

R 

Similarly  a12  =  _  12  ,  and  by  changing  the  order  a,t  = 

2R<r1<r2 

we  obtain  the  formula  as  given  at  the  bottom  of  p.  406. 

The  most  probable  value  of  xx  for  given  values  of 
given  by 


R 


11 


(2A„) 


n—1. 


•  <r,, 


A22  .  .  •  A9 


L"2 


A/!)( 


—  —  R 


13 


-p 

•  txi« 


— - ,  by  another 
•  <rn 2 

,  and  hence 

2Royrf 

2,  .  .  .  xn  is  then 


/j f  &+£**«* 

/^Hst j-'  j^-T 

I/A  /y^/  ^ 

CHAPTER  IX. 


PRECISION  OF  MEASUREMENTS  OF  AVERAGES , 
MOMENTS  AND  CORRELATION .* 

Inverse  Probability . 

In  the  previous  chapters  the  problems  of  the  errors  that 
arise  in  the  process  of  sampling  have  been  chiefly  discussed 
from  the  point  of  view  of  the  universe,  not  of  the  sample  ; 
that  is,  the  question  has  been  how  far  will  a  sample  represent 
a  given  universe  ?  The  practical  question  is,  however,  the 
converse  :  what  can  we  infer  about  a  universe  from  a  given 
sample  ?  This  involves  the  difficult  and  elusive  theory  of 
inverse  probability,  for  it  may  be  put  in  the  form,  which  of 
the  various  universes  from  which  the  sample  may  a  priori 
have  been  drawn  may  be  expected  to  have  yielded  that 
sample  ? 

To  make  the  argument  clear  it  seems  expedient  to  make  a 
short  digression  on  the  theory  of  inverse  probability.  The 
following  examples  illustrate  the  problem  and  its  solution. 

A  sovereign  and  two  shillings  were  in  a  purse.  One  coin 
is  lost.  One  of  the  remaining  two  is  taken  out  and  is  found 
to  be  a  shilling.  What  is  the  chance  that  the  sovereign  was 
lost  ? 

The  a  priori  chance  that  the  sovereign  was  lost  is  p\  =  J, 
and  that  a  shilling  was  lost  p\  =  § ,  if  we  assume  that  the  loss 
of  any  one  coin  was  as  likely  as  any  other. 

If  the  sovereign  was  lost,  the  chance  of  drawing  a  shilling 
was  p±—  i,  since  there  is  no  other  to  draw. 


*  See  Edgeworth  in  Statistical  Journal,  1908,  pp.  381  seq.  ;  Yule,  Intro¬ 
duction  to  the  Theory  of  Statistics,  last  chapter  ;  Transactions  of  the  Royal 
Society,  Pearson  and  Filon,  Vol.  191  (A.  220),  and  Sheppard,  Vol.  192  (A.  229), 

189S  ;  Biometrika,  Vol.  II,  Part  III,  p.  280. 
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If  a  shilling  was  lost,  the  chance  of  drawing  a  shilling  was 

P2  —  i- 

The  a  priori  chance  that  it  should  be  a  sovereign  that  is 
lost  and  a  shilling  that  is  drawn,  is  ppp±  — 

The  a  priori  chance  that  it  should  be  a  shilling  that  is  lost 
and  a  shilling  that  is  drawn  is  ppp2  =  f  X  J  = 

By  hypothesis  one  of  these  equally  probable  double  events 
has  happened,  and  there  is  nothing  in  the  data  to  show  which. 

It  is  therefore  just  as  likely  that  the  third  coin  is  a  sovereign 
as  a  shilling. 

We  may  generalise  this  proposition  in  the  following  way. 
Of  various  possible  events  whose  chances  of  occurrence 
are  respectively  pp ,  p2'  ...  p/  ..  .  one  is  known  to  have  taken 
place.  A  further  result  is  found,  whose  probability,  if  the 
first,  second  .  .  .  ^th  .  .  .  event  had  happened,  would  have  been 
Pv  P2  •  •  •  Pt  •  •  •  respectively. 

The  a  priori  chance  that  the  2th  event  happened  and  pro¬ 
duced  the  result  is  pt'  X  pt. 

A  priori  the  chances  that  the  events  of  the  first  series  would 
happen  and  produce  the  result  are  in  the  ratios 

Pi  Pi  ’■  P2  Pi  :  •  •  •  :  Pt  pt  •  •  •  —  Pi  •  P2  :  • 

But  we  know  that  one  or  other  of  these  did  happen,  and 
this  additional  knowledge  does  not  affect  the  relative  magni¬ 
tudes  Px,  Pa .  .  .  ,  but  raises  their  total  in  such  a  ratio,  K,  that 
it  equals  1,  which  represents  certainty  on  the  scale  of  algebraic 
chance.  Hence  K.SP#  =  1,  and  the  chance  that  it  was  the  tth 
event  in  the  first  series  is 

T^p  _  _ _ Pt  Pt  _ _ 

Pi  Pi  +  .  •  •  +  pt'pt  T  •  •  . 


In  a  bag  there  are  six  similar  balls  which  are  known  to  be 
black  or  white.  One  is  drawn  and  is  found  to  be  white.  What 
can  be  inferred  as  to  the  original  number  of  white  balls  in  the 
bag  ? 

The  answer  depends  on  the  hypothesis  made  as  to  the 
a  priori  chances  of  distribution  between  black  and  white. 

If  a  priori  each  ball  is  equally  likely  to  be  black  or  white 


and  pt'  is  the  chance  that  t  were  white,  pt'~ 


>c  •  6 


Q. 
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J 

pt  =  g  whatever  the  hypothesis. 


6 

P„:Pt 


p6  = 


1 

6  X28 


(0:6:30:60:60:  30  :  6) . 


SP t  —  i  and  K  =  2. 


The  chances  that  there  were  0,  1,  2,  3,  4,  5,  6  white  balls 
are  respectively  0,  A,  if,  if,  A,  A- 

But  if  the  number  of  white  in  the  bag  had  been  determined 
by  throwing  a  die  and  taking  the  number  on  the  upper  face, 
then 

1  t 


Pi  —  Pi  —  •  ♦  •  —  Pq 


6’ 


P  , 


6  ’  6 ' 
21 


sp' = &  K 


— ,  and  KP,  = 

7  21 


More  generally  if  there  were  n  balls  in  the  bag  and  the 
number  of  white  had  been  determined  by  spinning  a  disc, 
marked  on  its  circumference  with  the  numbers  0,1  ...  71 
equally  spaced,  on  a  vertical  axis,  and  taking  the  number 
nearest  a  fixed  point  adjacent  to  its  circumference  when  it 
came  to  rest,  then 


Pt'=An.pt=L  sp«-§.  KP, 

n  -j- 1  n  2 


2 1 


n(n  + 1)’ 

The  aggregate  chance  that  originally  the  number  of  white 
balls  was  t  or  less  is 

P«)  =  =/(/■),  say. 

If  f(t)  =  it  is  as  likely  as  not  that  there  were  as  many  as 

fl 

t  white ;  and,  if  n  is  great,  t  =  - satisfies  this  equation 

V  2 

approximately. 

Hence,  when  n  is  great,  it  is  as  likely  as  not  that  the 

proportion  of  white  balls  to  all  was  as  great  as  —  *7  .  .  . 

V  2 

The  chance  is  approximately  J  that  the  proportion  was 

V3 

between  l  and  -  • 

2 

This  example  is  very  important,  both  as  showing  that  the 
result  depends  on  the  hypothesis  made  as  to  the  relative 


412 


ELEMENTS  OF  STATISTICS 


a  priori  chances  of  the  unknown  events  between  which  we 
have  to  choose,  and  as  indicating  that  we  can  get  a  more 
comprehensive  result  by  aggregating  the  chances  than  by 
taking  them  singly. 


Precision  of  p,  the  Proportion  of  a  Particular  Class  in  a  Universe. 

We  will  first  apply  the  principle  of  inverse  probability  to 
the  determination  by  sample  of  p,  where  P'N  is  the  number  of 
things  having  a  certain  characteristic  in  a  universe  containing 
N  things,  and  n  are  selected  at  random  and  p'n  are  found  to 
have  the  characteristic. 

The  chance  that  p'n  should  have  been  found  from  a  given 
p  is  nCp'n  pp'n  qq'n  (p.  262)  where  q=i—p,  q'  =  i—p'. 

If  all  values  of  p  from  o  to  1  are  a  priori  equally  probable, 
then  the  chance  that  p'n  should  be  found  from  any  value  of  p, 
from  p' t  ox,  is  the  sum  of  the  chances  from  particular  values 

~x 

—  nCp'n  %pn  (1  —%Yn  dx,  and  therefore  by  the  theorem  on  p.  410 
j  p' 

the  chance  that  the  original  value  of  p  was  between  p'  and  x  is 


X  • 


nCp'n  .  Xpn(l  ~  %Yn  d% 
-  P’ 

•1 


r  , 

n^pn  • 


0 


xpn(i  —  x)qndx 


which  can  be  reduced  as  follows  to  the  form  of  the  normal 

curve  of  error  if  — -=  is  neglected. 

V  n 

Write  x  =  p'  +  zar,  where  ahi  =  p'q' ,  and  1  —  x  —  q'  —  za.  a  is 
of  order 


V  n 


Then 


/  ,  Z(T\P'n/  ZaY11  7 

„ \l+F)  V~¥)  dz 


X 


•  00 


1+F 


Z(r\P'n , 


Z(r\V'n  , 

1  -  f  dz 


:  =  1,  *  = 


since  if  x  —  +  ,  ~  x  t . 

X  p 

to  ±  co  when  n  is  great. 


and  if  x  —  o 


jj(z)dz 

I  f(z)dz 

+'  -  00 

/nf 

>  *  \/  „/  > 


say, 


which  tend 
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Then 


log/ (2)  =  p'n  log  (1  +  +  q'n  log  (1  - 

=  2X0  -  — T-  (-r,  +  -7)  +  terms  involving  <r8», 
2  \p  q  ) 


i.e.  terms  of  order 


Vn 


—  —  \z2,  when  -  is  neglected,  since  p'  -f  q'  —  1 

V  n 


.*.  P 


I 


e~iz2dz 


x 


/  e~lz2dz 


f 


V  2  77 


ze~^dz . (117) 


Hence  the  chance  that  the  observations  arose  from  a  universe 
in  which  the  proportion  was  between  p'  and  p'  -f  P\  is  (writing 

Pi 

ar 


—  for  z)  I 

rr  ' 


pi  1  -  —  f  W  (T  — 1>') 

~  e  du,  where  o-  -  J  CA— - v  > 

0  (T V 277  V 


n 


The  above  analysis  is  based  on  that  given  in  Todhunter’s 
History  of  the  Theory  of  Probability,  pp.  554  seq. 

This  is  the  converse  of  the  theorem  that  the  chance  of 
obtaining  a  value  from  p  to  p  -f  pi  in  a  sample  from  a  known 
universe  is 


pi 


U2 

272 


0  a 


V2 


du,  where  a  ==A  Pi* — P) 


7 7 


\ 


n 


The  difficulty  in  the  above  analysis  lies  in  the  assumption 
that  all  values  of  p  from  o  to  1  are  a  priori  equally  probable. 
The  hypothesis  can  be  elucidated  as  follows. 

Let  n  —  100  and  p'  —  *i. 

If  the  observations  came  from  a  universe  where  p  —  -07, 
then  <j2  =  Q^joo  ^  a  ~  *0255>  ^  —  i*i8=*.  The  aggre¬ 

gate  chance  from  p  —  *06  to  p  —  *08  is  approximately  the 

chance  for  p  =  *07,  exactly  given  by  the  ordinate  *—e  ~izZ, 

V  27 r 

multiplied  by  an  abscissa  of  *02  taken  as  a  multiple  of  a, 
viz.  *02  -f  *0255  =  -78,  and  equals  -78  of  -  e  ~  *(1'is)2  =  *157. 

V27 r 

A  series  of  such  calculations  leads  to  the  following  table. 
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Value  of  p. 

Approximate 
chance  of 

•00—02 

obtaining  p'  —  • 

•OOO 

•02— 04 

•002 

•04—06 

•029 

•06—08 

•157 

•08—10 

•262 

•  1 0—  1 2 

•242 

•I2—I4 

•159 

•I4—I6 

•084 

•l6—l8 

•039 

•18—20 

•OI4 

•20—22 

•005 

•22—24 

•OOI 

•24—26 

•OOO 

•994 

From  the  observations  p  would  be  given  as  -i  with  standard 

V*I  X  *Q 

—  ■  ■  =  -03,  and  a  considerable  positive  skewness. 

The  values  of  p  differing  from  -i  by  more  than  twice  the 
standard  deviation  give  negligible  probabilities  whether  we 
suppose  them  a  priori  equally  probable  throughout  the  scale 
or  not. 

This  example  then  illustrates  a  theorem  that  we  may  give 
as  obvious  :  that,  except  in  the  neighbourhood  of  the  central 
value,  it  is  indifferent  what  distribution  of  a  priori  probabilities 
of  p  we  suppose.  Over  the  small,  important  central  region  the 
assumption  that  the  a  priori  probability  of  p  over  a  region  is 
proportional  to  that  region  is  likely  to  be  a  good  first  approxi¬ 
mation,  whatever  the  actual  law.  (Edgeworth,  Statistical 
Journal ,  1908,  p.  387,  and  references  there  given.) 

We  are  not,  then,  liable  to  any  considerable  error  from  the 
assumption  that  underlies  this  and  similar  investigations,  that 
the  important  values  of  the  quantities  sought  are  a  priori 
equally  probable  at  any  point  of  the  range  of  values  that 
affects  the  analysis. 

We  may  now  sum  up  the  result  of  finding  p  by  sample. 
The  most  probable  value  in  the  universe  is  the  observed  value 
p'.  The  probability  of  a  deviation,  as  great  as  pv  from  the 
observed  value  is  given  approximately  by  the  normal  error 
function  with  standard  deviation 

v/«les  „  v'fcps  (.  -  5) 

fl 

where  N  is  the  number  in  the  universe  and  ^  is  not  negligible. 
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The  precision  of  a  measurement  is  measured  by  the 
reciprocal  of  the  standard  deviation  of  the  errors  to  which  it 
is  liable. 


General  Method. 

More  generally  let  X'  be  any  given  function  of  n  samples 
chosen  at  random  from  a  universe  where  the  (unknown)  corre¬ 
sponding  function  is  X,  and  let  X  ==  X'  -f-  x. 

If  we  can  show  that  the  chance  of  obtaining  the  value  X', 

-i— 

when  the  value  in  the  universe  is  X,  is  of  the  form  Px  =  P0e  *°-2. 
where  P0  is  the  maximum  chance  and  is  obtained  when  X  =  X', 
a  is  constant,  and  x  =  X  —  X',  then  we  can  affirm  with 
reasonable  certainty  that  the  sample  gives  evidence  that  the 
most  probable  value  of  the  function  in  question  is  X',  and  that 
the  chance  of  deviations  from  X'  is  given  by  the  normal 
function  with  standard  deviation  <j.  In  the  case  above, 
p'  is  X',  p  is  X,  x  is  zcr  and  na2,  =  p'(i  —  p').  For  the  more 
general  case  the  process  of  inversion  is  not  quite  so  direct.* 

In  order  then  to  determine  the  precision  of  any  measure¬ 
ment  based  on  a  sample,  we  have  to  take  three  steps,  the  first 
to  find  the  standard  deviation  of  the  difference  between  the 
true  value  and  the  observed  value,  the  second  to  find  the 
chance  that  any  assigned  deviation  would  arise,  the  third  to 
apply  the  principle  of  inverse  probability. 


Precision  of  the  Arithmetic  Average. 

In  Chapter  III  it  was  shown  that  if  n  quantities  were 
selected  at  random  and  independently  from  a  frequency  group, 
which  satisfied  certain  conditions,  that  the  chance  that  the 
average  of  the  n  quantities  differed  from  the  average  of  the 
universe  by  as  much  as  x  was 


*  If  all  values  of  X  are  a  priori  equally  probable  the  chance  that  the 
observations  came  from  a  universe  when  the  value  of  X  was  within  the 

limits  X"  ±  x  is  2  /  Pz  .  dx,  if  x  is  small,  and  by  inverse  probability  the 

J  0 

chance  that  the  value  in  the  universe  was  within  these  limits  is 


.  dx. 


e 
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X 


VaV  2 


x2 

2(7^2 

e  dx, 


7T 


where  cra 


cr 

vV 


a  being  the  standard  deviation  in  the  universe. 

It  will  be  shown  immediately  that  a'y  the  observed  standard 
deviation  of  the  sample,  differs  from  a  by  a  quantity  commen¬ 


surate  with  77^,  and  hence  if  n  is  large,  a'  may  be  taken  as 
equivalent  to  a. 

We  may  now  complete  the  argument  and  say  that  if  in  a 
sample  of  n  things,  drawn  independently  from  a  group  in 
which  no  large  portion  is  distant  more  than,  say,  twice  its 
standard  deviation  from  its  average,  the  average  of  the 
sample  is  x  and  its  standard  deviation  is  a,  then  the  chance 
that  the  average  in  the  universe  differs  from  f  by  as  much 
as  x  is 


when  n  is  large. 


(nS) 


Precision  of  the  Standard  Deviation. 


We  will  now  extend  this  theorem  to  test  the  precision  of 
the  standard  deviation  and  second  moment  as  determined 
from  the  sample. 


Let  x,  cr,  fi2  .  .  .  be  the  unknown  constants  of  the  universe,  and 
x  +  x cr',  nf  ...  be  the  corresponding  values  calculated  from  the 
sample. 

Let  x  +  Xt  be  any  observation,  and  write  x/  —  %t  —  x’ . 

The  frequency  curve  of  xt 2  has  /x2  for  its  average,  and  its 
standard  deviation  is  given  by  cr/  =  mean  ( xt 2  —  /x2)2  ~  —  nf- 

Its  fourth  moment  is  mean  ( xt 2  —  /x2)4  =  /x8  —  4/x61a2  -f  6/x4/x22  — 

=  M4,  say. 


In  the  universe  — ’  is  finite  by  hypothesis  for  all  values  of  s,  and 


therefore 


M, 


o'er 


(T 


4-J  +  6-J-3 

a  a* 


n  4 


cr" 


-i  is  finite. 


Similarly,  for  any  moment  of  Xt, 2,  Ms  ~  cr/  is  finite. 

Hence  the  average  of  the  quantities,  xt2,  as  occurring  in  the 
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sample,  by  the  theorem  summarised  on  p.  312  and  just  used,  may 
differ  from  /z2  by  an  error  with  normal  frequency  and  standard 
deviation 

ad  =  /  lli  —  H2 

Vn  ^  n 

This  average  m2,  say, 

=  —  =  -  S  (*',  +  x'Y  =  -  Sx't 2  +  .r'2.  since  S*',  =  o,  =  u  '  +  .i'2. 
w  n  n 

Now  T'2  is  of  order  -  from  formula  (118),  and  m2  has  just 

n 

been  shown  to  be  of  order  Hence  x'2  can  be  neglected, 

Vn 

and  /jl2  written  for  m2.  V 

Hence  the  observed  \x2  differs  from  /x2  in  the  universe  by  an 
error  with  normal  frequency  and  standard  deviation 

/  (no) 

>  n 

But  a2  =  fx2,  So-  =  — jV ;  hence  V  differs  from  o-  by  an  error 

2  V 

with  normal  frequency  and  standard  deviation 

J^ER . (I20) 

If  the  universe  was  normal,  ju4=3/x22,  and  the  standard  deviations 
of  the  observed  standard  deviation  and  second  moment  become 

^and<rvl . (I2I) 

respectively. 

By  a  similar  method  the  standard  deviation  of 

.  / Ma  —  //42 

=  —  is  v  /  '-5 — 
n  v  « 

where  w4  =  ^  S  (^'  —  T')4  =  /x4'  —  4^'/x3'  -f  6x'2fx2  —  3^'4. 

Hence  the  error  in  /u4'  is  of  as  low  order  as  that  in  w4,  that  is  of 
order  — ^r. 

V  m 

We  may  therefore,  in  calculating  the  standard  deviations  of 
H2  and  cr,  replace  the  unknown  /x4  and  /u2  by  the  known  /*4'  and  /*2', 
and  in  calculating  the  standard  deviation  of  the  average  we  are 
justified  in  writing  V  for  <r. 

* 


E  E 
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The  standard  deviations  and  frequency  curve  of  errors  in 
higher  moments  or  in  the  correlation  coefficient  cannot  be,  at 
any  rate  readily,  calculated  by  this  method,*  and  the  whole 
basis  is  reconstituted  and  the  arguments  reset  in  the  following 
paragraphs  which  are  based  on  the  papers  to  which  reference 
is  made  at  the  head  of  the  chapter. 

Standard  Deviations  of  the  Average,  etc.,  without  Reference  to 

Inverse  Probability . 

Suppose  that  in  a  universe  containing  N  measurable  objects, 
there  are  N  xy1  at  measurement  xv  N  Xy2  at  x2  .  .  .  ,  and  that 
n  objects  are  selected  at  random,  n/ N  being  so  small  that  the 
chance  of  getting  any  value  of  v  is  not  affected  by  previous 
selections. 

N  —  N  x  +  N  X  y2  +  •  •  • »  Ti  T*^  ~b  •  •  •  =  I- 

Let  xf,  a',  and  /x2'  be  the  average,  standard  deviation  and 
second  moment  found  from  the  samples.  Required  to  deter¬ 
mine  the  precision  of  these  values  as  representatives  of  the 
average,  standard  deviation,  and  second  moment  for  the 
universe. 

Suppose  xv  x2 ...  to  be  measured  from  the  (unknown)  average 
in  the  universe,  so  that  /q  =  S  (xtyt)  =  o 

Let  /x2  be  the  second  moment  for  the  universe,  so  that  fi2 
=  S  .  xt2yt,  and  write  fx2  =  a2. 

The  sample  will  not,  of  course,  contain  precisely  n  X  yx  at  xv 
n  x  y2  at  x2  etc. 

Let  the  numbers  actually  found  be  (yx  -f  efj  at  x1 .  .  .  (yt  -f  et) 
at  xt ... 

Then  -}-  e2  T-  .  . .  -f-  et  -f-  .  .  .  =  o. 

xv  x2  .  .  .  are,  of  course,  constant. 

Since  yt  is  the  chance  of  finding  an  object  at  xt  and  the  experi¬ 
ment  is  made  n  times,  et  has  normal  frequency  with  standard 
deviation 

Hence  the  mean  of  all  values  of  et2  is  ^ .  and  et  is  of 

n 

order  i/Vw. 


*  The  method  is  based  on  communications  to  the  author  from  Professor 
Edgeworth. 
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Let  E  be  the  aggregate  error  in  all  the  compartments  together 
other  than  the  stb  and  the  tih.  Write  Y  for  1  —  y«  —yt. 

Then  es  +  et  +  E  =  0 

2  eset  —  E2  —  e8 2  —  et 2 

.* .  Mean  eset  =  J  mean  E2  —  J  mean  es 2  —  |  mean 
=  ~  (Y (1  -  Y)  -y,{  1  -y,) -yt( i 

=  ^{(x  -  >  -  :>'<)  (y« + ,y<)  -  >'«  (1  -  y>)  —yt(i  -yt)  1 

y*yt 

n 


Now  let  F  be  any  linear  function  of  yv  y2 .  .  . ,  so  that 

F  =  aiyi  +  2  +  •  •  • , 

where  ax  a.2  .  . .  are  known  constants. 

Write  F  +/=  a1(y1  +  ex)  +  . .  .  +  ctt(yt  +  et)  +  .  . 
so  that  f  —  axex  -f-  •  •  •  T-  dtet  *T  . . . 

P  =  Sat2et 2  -f  2S  a  sate  get. 

Then,  if  erf  is  written  for  the  mean  value  of  /2  when  all  possible 
values  of  ex,  e2,  .  .  .  have  been  found  in  due  proportion, 

oy2  =  Sat 2 .  (mean  et2)  +  2Sasat  (mean  e&et) 

=  ^-{Sat2yt(i  —  yt)  —  2S  a8atyByt} 

w 

=  -{Sat2yt  —  F2} . (122) 

n  v  ' 

Put  ax  —  xx  . .  .  at  —  xt . . . 

F  =  Svry*  =  0 
=  F+/  =  Sxtet 


cr_.2  (=  mean  of  x'2)  =  -  Svt2yt  from  (122) 


IH 

n 


•  (^23) 


v'  is  therefore  of  the  order 


Now  put  ax  =  xx2  . . .  at  =  xt2 

/x2  =  F  =  S  xt2yt,  /x2'  =  S  (xt  —  x ')2  (yt  +  et) 
jU>2/  _  ^  =  S*t2£t  —  2x'Sxtyt  +  terms  involving  T'^and  T'2 


which  are  of  order 

n 


E  E*  2 
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H2  —  ii2—Sxt2et,  since  Sxtyt =  o,  if  terms  in  -  are  neglected. 

72 


from  (122)  o-2/X2  —  mean  of  (yu2'  —  ^2) 2 

1 


n 


{S  (xfyt)  —  /x22}  =  —  .  .  .  .(124) 


Now  o-2  =  /X2 

Hence  increments  Scr,  8/x  of  a  and  /x2  are  connected  by  the 
equation 

2a8a  =  8ju2,  or  8cr  —  — ~ — . (125) 


Hence 


a  2  =  mean  (8a)2  —  mean 


2^2 

(«2 


4/*2 


a  2  =  — - —  from  (124) . 

4/x2n 

A  similar  analysis  leads  to  the  general  result 


°>/  =  IT  0*2P  ~  2M>  +  i  Pp-i  +  fi*  1P2  ~  /*/) 


w 


(126) 


Hence  the  standard  deviations  of  x\  a  and  all  moments  involve 
the  factor  —7-,  and  if  n  is  large  the  difference  between  the  apparent 

V  n 

and  true  measurements  is  of  the  order  — and  may  be  neglected 

V  n 

in  formulas  involving  them.  Consequently  in  evaluating  the 
formulae  123  to  126  the  calculated  values  of  the  moments  n2  etc. 
can  be  substituted  for  the  unknown  true  values. 


Notice  that  the  standard  deviation  of  each  moment  depends 
on  the  moment  of  twice  its  order,  and  this  higher  moment 
rapidly  becomes  great  as  the  order  is  increased.  In  practice 
it  is  found  that  with  ordinary  values  of  n  the  moments  above 
the  4th  lack  precision  for  this  reason. 

If  the  frequency  curve  of  the  universe  is  normal  /x4  =  3//22, 
and 


(rx 


a  „■  = 


a 

V  2  n 


°V  3 


(127) 


PRECISION  OF  MEASUREMENTS 


421 


lh«  first  two  results  for  the  normal  curve  can  be  obtained  more  directly 
as  follows.  Let  X1?  X2 .  . .  X„  be  the  measurements  of  n  things  taken  at 
random  from  a  group  whose  frequency  curve  is 


(■>-  -  -7>)2 

2<r2 


where  x0  and  <r  are  unknown. 

Px,  the  probability  that  these  particular  n  things  will  be  selected,  is 


(Xi-Jp) 


o 


1 

(T'V  TC 


(X2-X0)2 


2  (T2 


X 


(Xt-.*0)2 

I  b  2cr2 

- e 

n 

an7T2 


Let  x  be  the  average  of  the  X’s,  X<  ==  x  +  xt  etc.,  so  that  Sxt  =  o. 


log  Px  —  n  log  or  -log  7T  -  --  S  {x  -  *0  +  x,}2 


It  X 

=  -  w  log  <T  -  -  log  7T  -  —  [n  (x  -  x0)2  +  ns2}, 

w  2(7 

where  s  is  the  standard  deviation  of  the  X’s. 

Here  x  and  5  are  known  and  <r  and  x0  are  to  be  determined. 
P*  is  greatest  when  x  —  x0  is  least,  whatever  the  value  of  <r. 
Give  x0  the  value  x 

dP , 

Then  PA.  is  greatest  when  -5—^  is  zero,  that  is  when 

dcr 


— . .  ns2,  and  <r  = 

£T" 


s. 


Write  <r  =  5  -f-  7  and  x0  =  x  +  8,  and  let  P0  be  the  value  of  P.t  when 
7  =  o  =  5. 

Then  log  Px  -  log  P0  =  -  n  log  ~  -  (52  +  s2)  +  ^ .  sa 

5io4;=  -ios{i+})-Mi+ ?)"’ (5! + s>) + * 


lo 

1 

1 

,  c2_ 

r  0  •  ■ ' 

2s3 

s2 

2S2’ 

-h 

S2 

(  J_\ 

p 

il 

hj 

0 

WnJ  „ 

3_72 

254  ‘ 


s2,  neglecting  7 3,  7s2  etc.. 


Hence  the  errors  in  x  and  s  are  independent  of  each  other,  are  of  normal 

5  5 

frequency,  and  their  standard  deviations  are  respectively  -  and  — ,  where 

V  n  V  2  n 

s  is  the  standard  deviation  of  the  sample,  and  for  it  <r,  the  standard  deviation 
in  the  universe  may  be  written  without  perceptible  error  when  n  is  large,  as 
in  the  results  already  obtained. 
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Standard  Deviation  of  the  Correlation  Coefficient. 

The  standard  deviation  of  the  error  to  which  the  correla¬ 
tion  coefficient  is  liable  may  be  found  as  follows.  (Here 
Dr.  Sheppard’s  method  is  followed.) 

Let  there  be  pairs  of  values,  such  as  (xt,  yt),  measured  from 
their  averages,  whose  standard  deviations  are  crv  a2  and  second 
moments  A,  /x.  Let  the  whole  number  of  pairs  be  N,  and  let  ZfN  be 
situated  at  (x  -f-  xt,  y  +jy«),  so  that  zx  +  .  .  .  +  zt . .  .  =  i. 

Also  Sztxt  =  o  —  S ztyt. 

Write  for  the  mean  of  xsyl  taken  over  all  the  pairs,  so  that 

=  S zx8yl.  Write  M  for  Mu. 

Then  if  r  is  the  coefficient  of  correlation  of  the  N  pairs, 

M 

r  = -  and  log  r  =  log  M  —  J  log  A  —  A  log  / x . 

o-io-2 

Now  let  a  selection  of  n  pairs  be  made  at  random,  and  let  the 
number  selected  from  the  position  xt,  yt  be  (zt  +  et)  n. 

If  x',  y'  are  the  resulting  averages 

x'  =  S  (zt  +  et)  xt  =  S xtet,  and  y'  =  S yte%. 

The  resulting  deviations  between  the  values  in  the  sample  and 
the  values  in  the  universe  of  r,  M,  A,  /x  are,  by  differentiating  the 
equation  for  log  r,  evidently  connected  by 

8r  SM  8A  Sfx 
r  ~~  M  2A  2/x  ’ 

Now 


8M  =  S  (zt  +  et)  (xt  —  x')  (yt  —y')  —  S ztxtyt  —  S xtytet  —  x' .  S ztyt  —  yr.  Sztxt, 
when  products  of  any  two  of  the  small  quantities  et,  x' ,  y',  which 

are  of  order  ~~  are  neglected. 

V  n 

: .  SM  =  S xtytet. 

As  shown  above  (pp.  419-20), 

8A  =  S  xt2et  and  8/x  =  S yt2et. 

Sr  _  /xtyt  xt 2  yt2\ 

r  \  M  2A  2/x/  6t" 

Hence,  from  the  general  formula  (122),  if  or,-  is  the  standard 
deviation  of  the  errors  in  r, 


err  — 


n 


S 


(xtyt  xr  yt 
2A  2/x 


,2\  2 


V  M 

I  ^22  |  ^4  1 
n  \  M2  1  4A.2  +  4^2 


M, 


31 


AM 


cJ7 *l_y±\zt 

1\M  "~2A  2/i/ 

_  M^2  ( 

fx  M  2A/X  J , 
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since 


c  (xiyt  xt2  yt2\  ,  . 

s'—  = 


M 


2/x/ 


where  A4  and  are  the  fourth  moments  of  the  distribution  of  the 
N  x*s  and  N  y’s. 

In  the  case  where  the  original  distribution  is  that  given  by 
the  normal  correlation  surface,  A4  =  3A2,  /x4  =  3/x2,  M  =  fo^o-g, 
M22  =  (1+2 r2)  <t2& % ,  M31=3for13(r2,  M13=3ro-1o-23  (formula  (106)), 


and 


(Tr2  = 


n 


1  4-  2 r 

-4/2 


+  f  +  I  — 


3  + 


1+2  r2\  (1  —  r2)2 


n 


and 


(129) 


This  is  the  value  generally  used,  it  being  implicitly  assumed 
that  the  distribution  approximates  to  the  normal. 

The  regression  coefficient,  when  y  is  expressed  in  terms  of  x, 


a  M 

is  r  —  .  =  p,  say.  In  the  present  notation  p  =  — ,  and  we  obtain 

crx  A 


by  a  method  similar  to  that  just  used, 


a*=P- 

p  n 


fM22  .  A4  2M31I  .  ,. 

1  M2  +  A2  AM  J  m  any  dlstnbutlon* 


Hence  in  normal  distribution 


2  _  c_ 

3  n 


P2  f  1  +  2  r2 


+  3-61  =  4- 

n 


cr 

p 


—  .  Vi  —  r2. 
crx 


In  the  case  of  normal  distribution  the  result  may  be  reached  as  follows. 
(Here  Professor  Pearson’s  method  is  followed.) 

Suppose  pairs  xlt  yx  .  .  .  xn,  yn  are  chosen  from  a  surface  whose  unknown 
centre  is  x0,  y0,  standard  deviations  cr1,  <r2,  and  mean  product  *'<r1<r2. 

Let  x,  y,  sx,  s2,  r'  be  calculated  from  the  sample. 

The  chance  of  concurrence  of  the  n  pairs  is 

1  | (xt ~ ^q)2  ,  (y<-y o)2  _  ^r(xt-XQ)(yt-yo) > 

p  _  _ 1  e  2(1  -  r^)  1  cr!2  V22  . 

(27 ro-jO-jjV  I  —  r2)n 

ft 

*.  log  Pz  =  —n  log  27r<r1o-2  —  -  log  (1  —  r2) 

n  )  sx 2  +  dx 2  s22  -f  d2z  __  2V .  (/y2  +  dxd2)  | 

“2(1  -r*)  \  +  i 

where  ==  a0  —  x,  d2  =  yQ  —  y. 

Here  r,  <rlt  a2,  dv  dt  are  unknown,  and  r',  sx,  s2  known. 
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By  expressing  the  conditions 

0P  _  _5P 

ddx  °  dd2  d<rx  ?cr2  —  dr  ’ 

we  obtain  the  values  of  the  five  unknowns  which  make  P*  a  maximum. 

dx  rd2  d  2  rdx 

(Tx‘  <rx(r2  <r22  <tx(t2  ’ 

whence  dx  =  d2  —  o,  unless  r2  =  i. 

Then,  taking  dx  and  d2  to  be  zero, 

£ _ Sj2 _ rr'sxs2  _ _ i _ s22  rv'$xs2 

ax  (i  -  r2j <rx 8  +  (I  -  r2)  <rx2a2  ~  °  ~  (x2  (i  -  r 2)  cr23  1  (i  -  r2)  axa2 

(i  —  r2)  <r12<r2  =  Sx2(T2  —  ry/5152<r1 

(i  ~  r2)  o-j o-2 2  =  -  ^'sr<;2(r2, 

—  =  —  =  k,  say,  and  i  —  r2  —  k2  (i  —  rr') 

<rx  c2 

_ L  JV  ,  *jl  _  ggM  ,  — =  o. 

I_,-2  ( I  f2) 2  l  O'!2  1  0-22  (TiO-2  /  (I  -  r2)  oyr2 

r  ( i  —  r2)  —  2rk2  (i  —  fr')  +  r'  (i  —  r2)  A2  =  o. 

Hence  r  =  r'  and  k  —  i. 

~PZ  is  greatest,  therefore,  when  the  values  found  in  the  sample  are  taken  as 
the  values  in  the  surface.  Write  P0  for  the  value  of  Vz  so  obtained. 

Now  write  <rx  —  sx  -f  yx,  <r2  =  s2  +  y2,  and  r=  r'  +  p,  and  expand  all  functions 
in  powers  of  the  small  quantities  dx,  d2,  yx,  y2,  p,  neglecting  third  powers. 

We  obtain 


and 

whence 

and 


1  /  ^i2  ,  ^22 _ dxd2\  r  /  p yx  P72\ 

2  (i  —  r'2)  Vsj2  s22  SiS'2  J~^i  —  r'2\sx  1  s2  ) 

7j72  _  2-f'2  /  7i2  ,  7,2V  I  +  r'2  2 

1  —  r'2  sxs2  2  (i  —  r'2)  \  5j2  1  s22  J  2  (i  —  r'2)  p 

I  fdx  r'd2 V  d22  2—r'2  fyx  r'2  y2  r'  \2 

2(1— r'2)\sx  s2  )  2 s22  2(1—  r'2)\sx  2—r'2  ’  s2  2—r'2 P  ) 

2  (y*  r'p  V  _ P2 _ 

2  ~r'2\s2  2(1  r'2)  J  2(1  r'2) 2 


Integrate  successively  between  extreme  limits 

for  dx,  regarding  d2,  yx,  y2,  p  as  constant, 
for  d2,  regarding  yx,  y2>  p  as  constant, 
for  yx,  regarding  y2,  p  as  constant, 
and  for  y2,  regarding  p  as  constant. 

We  then  find  that  the  whole  chance  of  the  observations  arising  from  a 
value  r'  +  p,  whatever  the  values  of  x0,  y0,  o-x,  <r2  is 


P-  Ke 


V  ✓»  )  • 


X  — - 

That  is,  the  distribution  is  normal,  with  standard  deviation  of  r'  — - —— 

s'n 
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The  work  on  pp.  421  and  424  shows  that  if  the  frequency 
group  from  which  samples  are  taken  is  normal,  then  the  chances 
of  obtaining  various  errors  in  x,  a,  /i  and  r  in  the  sample  are 
given  by  the  normal  probability  function  ;  and  inversely,  that 
the  chances  that  the  corresponding  quantities  in  the  universe 
have  various  deviations  from  the  observed  quantities  are  also 
so  given.  It  remains  to  prove  that  under  other  conditions 
the  same  result  is  obtained. 

In  each  case  the  quantity  concerned  was  put  in  the  form 

F  +  /  =  a1(y1  +  ej)  -f-  a2(y2  +  e2)  +  .  .  . , 

where  e1  +  e2  +  . . .  =0,  and  f  —  0  when  0  —  e1  =  e2  =*  ... . 
Also  +y2T  •  •  •  —  I- 

The  frequency  curves  of  ev  e2  .  .  .  are  normal  if  n,  the 
number  in  the  sample,  is  large,  p.  418. 

If  ev  e2  .  .  .  were  independent  of  each  other,  or  if  the 
number  of  separate  values  of  xlt  x2  .  .  .  were  so  great  that  we 
could  treat  them  as  independent,  then  we  could  at  once  apply 
the  theorem  of  pp.  295  seq.  and  state  the  frequency  of  /  is 
normal. 

The  full  analysis  (given  in  Appendix,  Note  9)  leads  to 
the  result  that  normality  may  be  presumed  under  the  same 
conditions  affecting  the  universe  from  which  the  samples  are 
taken  as  lead  to  normality  of  the  average,  viz.  :  that  the 

universe  is  so  confined  that  the  ratio  ~  is  finite  for  all  values 

a1 

of  t.  (p.  299). 
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TESTS  OF  CORRESPONDENCE  BETWEEN  DATA  AND 

FORMULAE. 

In  the  general  method  of  the  representation  of  observa¬ 
tions  by  a  mathematical  formula,  the  question  must  arise 
how  the  adequacy  of  the  formula  is  to  be  tested,  or,  as  it  is 
frequently  phrased,  a  test  of  the  goodness  of  fit  is  required. 

Consider  for  example  the  table  used  above  (p.  310)  of  the 
weekly  expenditure  on  food  per  “  unit  ”  in  970  families. 


Expenditure. 

ml 

number  of 
cases. 

m 

calculated 

numbers. 

e  =  mr*>  ni' 
difference. 

Standard 

deviations. 

f2 

Vil 

Not  exceeding  5 -5s.. 

18 

22 

4 

4-6 

•7 

5*5  • 

IO7 

123 

16 

10-4 

2-1 

7*5  • 

255 

234 

21 

i3*3 

1-9 

9-5  . 

245 

249 

4 

13-6 

•I 

n-5  .... 

173 

168 

5 

n-8 

•I 

13*5  .... 

IOI 

89 

12 

9-0 

1-6 

15-5  .... 

38 

51 

13 

7-0 

3'3 

17 -5  .... 

17 

22 

5 

4-6 

i-i 

19-5  .... 

9 

II 

2 

3*3 

*4 

Over  21-5  . 

7 

I 

6 

36-0 

Totals  . 

970 

97O 

88 

— 

47*3 

The  calculated  numbers  are  from  the  second  approxima¬ 
tion  to  the  Law  of  Great  Numbers.  A  rough  method  formerly 
used  was  to  add  the  differences  between  the  calculated  numbers 
and  the  numbers  observed  in  each  compartment,  irrespective 
of  sign,  and  to  express  this  total  as  a  percentage  of  the  number 
of  cases.  The  “  percentage  misfit  ”  thus  calculated  is 
88  —  970  =  9-1  per  cent. 

The  weakness  of  this  method  is  that  it  is  not  related  to 
any  measurement  of  probability,  and  one  cannot  tell  at  sight 
whether  the  fit  is  good  or  not.  Of  two  competing  formulae, 
the  presumption  is  that  that  which  gives  the  lower  percentage 
misfit  is  the  better  ;  also  when  we  have  several  sets  of  similar 
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observations  we  can  tell  roughly  by  this  method  which  is 
nearest  to  the  formula,  and  in  some  cases  in  which  set  the 
observations  are  most  regular. 

The  percentage  misfit  is  generally  diminished  if  compart¬ 
ments  are  merged  together. 

As  regards  the  contents  of  individual  compartments,  we 
already  have  a  simple  test.  If  mt  is  the  calculated  number  in 
a  compartment  when  there  are  N  observations  in  all,  the 
chance  of  finding  mt  -f  et  observations  in  this  compartment  in 
a  random  selection  is 

— e  (formula  (19))  where  a2  =  ~  ( 1  —  ^  jN, 

and  the  probability  of  exceeding  any  assigned  multiple  or 
sub-multiple  of  a  is  given  by  the  table  (p.  271).  The  standard 
deviation  for  each  grade  in  the  above  example  except  the  last 
is  given,  and  it  is  seen  that  four  out  of  nine  errors  are  less  than  cr, 

their  standard  deviation,  two  are  between  a  and  — ,  and  the 

2 

remaining  three  less  than  2 <7.  No  separate  measurement  is 
improbable,  and  therefore  the  whole  grouping  may  be  presumed 
to  be  not  improbable,  except  the  final  number,  7  above 
21*55. 

That  numbers  in  extreme  grades  should  be  discontinuous 
in  relation  to  middle  grades  is  common  in  many  classes  of 
observations. 

The  deviations  are  not  independent,  however,  since  their 
total  must  be  zero  ;  and  even  if  the  deviation  in  one  compart¬ 
ment  taken  by  itself  is  improbably  large,  it  may  yet  not  be 
improbable  when  all  the  compartments  are  considered.  A 
measurement  which  allows  for  this  modification  has  been 
devised  by  Professor  Pearson,  and  part  of  the  analysis  in  a 
simplified  form,  a  brief  table  of  the  results,  and  some 
applications  are  given  in  the  following  paragraphs  (see  The 
Philosophical  Magazine,  No.  302,  July,  1900,  pp.  157-175). 

Suppose  that  a  formula,  which  is  presumed  to  represent 
the  distribution  of  observations,  leads  to  the  expectation  of 
mlt  m2  .  .  .  mn  observations  in  n  grades  or  compartments, 
when  N,  —  nix  +  m2  +  is  the  whole  number  of 

observations. 

In  an  experiment  or  group  of  observations,  suppose  that 
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(m1  -f  ex)  .  .  .  [mt  +  et)  .  .  .  (mn  +  en)  are  found  in  the  compart¬ 
ments,  so  that  ex+  .  .  .  +  et  +  .  •  .  -f  =  o. 

Write  =  •  • 

Then  pt  is  the  chance  that  an  observation  from  a  group  satis¬ 
fying  perfectly  the  formula  will  fall  into  the  tth  grade. 

The  chance  that  mt  -\-*et  will  fall  into  this  grade  when  N  are 
chosen  at  random  from  an  indefinitely  large  universe  is 

_  Tl_ 

.-1 _ p  2gy2 

where  at2  =  pt( i  —  ^t)  N  =  ptqtN,  where  qt=  i  —  pt. 

It  can  be  shown  that  the  joint  chance  of  the  errors  named  is 

Ke~^x\  where  x2=S.  — ,  and  Set  =  o, 

mt 

K  being  a  constant. 

For,  if  there  were  only  two  compartments,  ex  +  e2  =  o,  and  the 
joint  chance  equals  the  chance  of  either. 


Then 
The  chance  is 


X  nh  m2  ,  XT 

P=  Jj-.  ?  =  -^.  Wi  +  m2  =  N. 


N-'  Kmr+ m2)  •  «i2N  el  (m2  +  ml)  ,  2  2 

- n  "  .since— — =  - -  ,  and  «,*  =  «,*. 

■V2vm1m2  mim2 

If  there  are  three  compartments 

e1  +  e2  +  e3  =  o,  mx  +  m2  +  m3  =  N,  oq2  =  ^ — -  .  N, 
and  similarly  for  cr22  and  cr32. 

2exe2  ==::  e j2  £22, 

^cri°r2  ~  mean  £j£2  =  \  (cr 32  —  o-j2  —  o-22) 

i 


1%  (Wi+Wjj)  —  (Wa4-  w3)— m^m^Ms)} 


num2  , 

= - (Compare  p.  419.) 


The  chance  of  the  concurrence  of  ex  and  e2,  and  therefore 
of  e3  also,  is  given  by  the  normal  correlation  surface  as 


ex‘ 


+ 


«a“ 


27rcr1<T2 


Vi 


—  e 


2(1 -f 2)  cra2 


p,cr2 
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Now 


u  1  “  =  N 

since  m1  -f-  m2  +  mz  =  N. 


Hence  the  index  of  e  is 


Now  if  the  second  and  third  compartments  had  been  merged 
into  one  containing  M  +  E  observations,  where  M  =  m2  +  m3  and 
E  =  e2  +  ^3.  the  chance  would  have  been 


where  Kx  is  a  constant. 

The  effect;,  therefore,  of  dividing  the  second  compartment  with¬ 
out  changing  the  first  is  to  alter  the  constant  and  to  replace 


TT2  p  2  p  2 

— —  by  —4-—  in  the  index. 
M  m9  m  o 


Similarly  if  two  compartments  are  given,  the  effect  of  dividing 
the  third  compartment  without  changing  the  first  two  must  be  to 


2  /»  2  n  2 

3  ^3  ,  ^4 


£■  £  *  £  * 

alter  the  constant  and  to  replace  —  by  —  +—  in  the  index,  and 


m  3  mz  w4 


so  on. 


Hence  for  n  compartments  the  chance,  P,  of  errors  ev  e2  . . .  en  is 


Notice  that  x2  is  the  same  expression  as  is  used  in  obtaining  the 
coefficient  of  contingency. 

[A  proof  of  the  formula,  without  the  above  method  of 
induction,  is  given  by  Pearson,  by  the  use  of  the  multiple 
correlation  equation.] 


If  the  selections  in  the  compartments  had  been  independent 
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and  without  the  condition  that  ex  +  e2~\-  .  .  .  =  o,  the  chance 
would  have  been 

+  .  .  .) 

Ke-*x»x, 

for  the  index  would  have  been 


<q2N 


vwx(N  —  m±) 


N —m1 


If  there  are  many  compartments  and  the  largest  of  the 

fractions  ~  is  small,  the  second  part  of  the  index  is  negligible 

compared  with  the  first,  and  the  two  expressions  tend  to 
equality,  and  the  effect  of  the  correlation  is  small. 

The  chance  of  the  occurrences  if  there  is  no  correlation  is 
less  than  that  when  there  is  correlation,  since  the  last  factor, 
if  not  negligible,  is  less  than  i.  (The  constant  is  eliminated  in 
further  processes.)  Hence  the  aggregation  of  uncorrelated 
chances,  which  is  simpler  than  the  present  method,  gives  an 
unduly  unfavourable  view  of  the  appropriateness  of  a  formula. 

The  chance  of  every  system  of  errors  that  gives  a  particular 
value  of  x2  is  the  same.  Now,  when  the  probability  of  a 
deviation  from  the  mean  in  normal  frequency  is  in  question, 
it  is  customary  to  measure  the  probability  that  so  great  a 
deviation  to  left  or  right  should  have  occurred,  viz., 


2 


e~iz2dz. 


Similarly  here  we  may  measure  the  chance  of  the  occurrence 
of  the  system  of  errors  or  a  less  probable  system  by  evaluating 


2 


.  .  .  Ke~^x  dx,  where  dx  is  written  for  de±  .  de2 .  .  .  den.x 


and  the  integral  is  n  —  i  fold  and  extended  from  x  to  oo  ,  with 

the  condition  e1  -f  e2  +  .  .  .  -f  en  —  o,  K  being  so  chosen  that 

"°°  2 

Ke~^x  d  —  i. 

J  -  GO 


The  existence  of  this  condition  makes  the  integration 
complicated,  and  reference  should  be  made  to  Pearson's 
original  analysis  for  its  working  out. 

The  result  is  that 


x 


«-  3 


i  •  3  •  5— »— 3 
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when  n  is  even,  and 

P  =  e-*x~(  ip- — h  •  •  •  H - — — =  )  when  n  is  odd.  (131) 

k  2  2 . 4  ...  n  —  3/ 

A  table  of  the  values  of  P  for  various  values  of  x2  and  w 
is  given  in  Biometrika,  Vol.  I,  pp.  155  seq.  We  can,  in  a  very 
brief  form,  obtain  a  working  rule  for  determining  whether  a 
formula  does  or  does  not  adequately  represent  an  observed 
group  by  picking  out  values  of  x2  which  for  a  given  n  make 
P  =  \  or  slightly  more,  or,  further  up  the  scale  of  improba¬ 
bility,  make  P  =  *0455  or  slightly  less,  which  corresponds  to 
twice  the  standard  deviation  in  the  normal  curve. 


n. 

xL 

p. 

X2. 

P. 

3 

I 

•61 

6 

•050 

4 

2 

•57 

8 

•046 

5 

3 

•56 

10 

•040 

6 

4 

‘55 

12 

•035 

7 

5 

•54 

13 

•043 

8 

6 

*54 

15 

•036 

9 

7 

•54 

16 

•042 

10 

8 

•53 

18 

•035 

n 

9 

•53 

19 

•040 

12 

10 

•53 

20 

•045 

13 

II 

•53 

22 

•038 

14 

12 

•53 

23 

•042 

15 

13 

•53 

24 

•046 

16 

14 

•526 

26 

•038 

17 

15 

•525 

27 

•041 

18 

l6 

•524 

28 

•045 

19 

X7 

•523 

30 

*°37 

20 

18 

•522 

25 

23 

•520 

30 

28 

•518 

2 

If  X  <  n  —  2,  it  is  at  least  an  even  chance — as  likely  as  not — that  the 
observations  would  be  found  from  a  group  represented  by  the  formula. 

If  X2  >  2 n,  the  improbability  is  considerable. 


Strictly,  the  test  should  be  applied  using  as  many  compart¬ 
ments  as  are  given  by  the  observations,  for  the  merging  of 
compartments  affects  the  resulting  value  of  P  ;  but  it  is  often 
difficult  to  get  back  to  ungraded  observations,  and  in  the  case 
of  continuous  variables,  such  as  height,  the  original  grades 
would  be  as  fine  as  the  measurements  could  be  made. 

A  more  serious  difficulty  is  that  in  any  compartment  the 
observed  mt  +  et  must  be  integral,  while  mt  is  in  general  not 
integral,  and  some  value  of  et  would  be  found  in  the  most 
perfect  representation.  In  consequence,  the  number  to  be » 
expected  in  the  least  occupied  compartment  must  be  reasonably 
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large,  or  we  obtain  spurious  contributions  '  to  x2.  This  in 
practice  rules  out  detailed  extreme  compartments,  and  in  their 
rejection  or  fusion  an  element  of  arbitrariness  is  introduced 
and  no  fine  measurement  is  possible. 

On  the  other  hand,  when  we  are  testing  the  applicability 
of  the  normal  curve  of  error,  or  the  general  law  of  great  numbers, 
based  on  Edgeworth's  hypothesis  (p.  298-9),  there  is  no  expecta¬ 
tion  of  closeness  of  fit  on  abscissae  beyond  a  small  multiple  of 
the  standard  deviation — the  smaller  as  the  number  of  inde¬ 
pendent  elements  that  contribute  to  the  measurement 
diminishes — so  that  the  test  is  only  applicable  to  the  well- 
occupied  central  compartments  ;  but  in  choosing  the  extent 
over  which  the  test  is  made,  the  fineness  of  the  method  is  lost. 

Hence,  only  a  broad,  but  often  sufficiently  definite,  result 
can  be  obtained. 


Illustrations. 

If  we  neglect  the  extreme  grade  in  Example  7,  on  p.  310, 
x2  =  11*3,  n  —  9,  P  *=  *18,  and  the  formula  “  2nd  approx.’’ 
is  adequate. 

If  we  take  the  Pearsonian  formula,  on  the  same  page, 
x2  =  21*4,  n  =  9,  P  =  *006,  but  if  we  exclude  the  lowest  as 
well  as  the  highest  grade,  x2  =  4-1,  n  =  8,  P  =  *77  ;  hence 
this  formula  expresses  the  central  eight  grades  but  not  either 
extreme. 

The  same  conclusions  are  reached  if  we  simply  take  the 
standard  deviations  of  the  grades  separately. 

In  the  table  on  p.  309  relating  to  the  ages  of  school  children, 
n  —  8.  The  normal  curve  gives  x2  =  16*7  and  P  =  -02, 
which  is  not  satisfactory.  The  second  approximation,  how¬ 
ever,  gives  x2  =  *47  and  P  is  indistinguishable  from  1. 

In  the  experiment  on  the  numbers  of  letters  in  words 
(pp.  305-6),  the  sum  of  10  words,  graded  by  5  letters,  gives 
n  =  13,  and  with  the  normal  curve  x2  =  33,  P  =  *ooi,  or 
omitting  the  lowest  and  two  highest  extreme  grades,  n  ==  10, 
x2  =  6*i,  P  =  *73.  The  second  approximation,  however, 
including  all  grades,  gives  x2  =  8-4,  P  =  *74. 

The  sums  of  100  words  graded  by  20  letters  give  n  —  10, 
x2  =  2-96,  P  =  -965  with  the  normal  curve,  and  no  further 
approximation  can  improve  on  this. 
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An  example  of  a  different  kind  is  found,  when  a  distribu- 
tion  found  by  sample  is  compared  with  the  whole  group  from 
which  the  sample  is  taken,  to  verify  the  rules  of  sampling  or 
the  adequacy  of  the  method. 


Number  of  Companies  Paying  Dividends  at  Various  Rates. 


Number  in 
sample 
to'. 

Relative 
numbers  in  all 
companies 

TO. 

Standard 

deviation. 

e* 

TO 

Below  3  per  cent. 

34 

30 

5*3 

•53 

3  per  cent.  .... 

108 

108-8 

8-9 

0 

4  *9  •  •  •  •  • 

117 

124-4 

9-3 

•44 

5  >>  •  •  •  •  • 

60 

70-8 

7-4 

1-65 

6  per  cent,  to  8  per  cent. 

48 

43-2 

6-2 

•53 

8  per  cent . 

33 

22-8 

4-6 

4’57 

400 

4OO 

7'72 

Here  n  =  6,  X2  =  7-72,  P  =  *185.  The  result  is  fairly  good,  but  spoilt 
by  the  highest  grade. 


This  test  has  been  applied  to  the  distribution  in  two  dimen¬ 
sions,  in  the  experiment  tabulated  on  p.  394. 

The  24  squares,  3  to  left  and  right  of  centre,  and  2  above 
and  below  it,  which  contain  in  theory  11  or  more  observations, 
were  taken  as  separate  compartments.  Outlying  squares  were 
grouped  in  the  9  regions  shown  by  the  thick  lines,  rather 
arbitrarily,  so  as  to  get  contiguous  squares  which  aggregated 
to  at  least  9  expected  observations  in  the  second  approxima¬ 
tion.  The  results  are  as  follows  : — 


24  central  squares 
9  outlying  regions 
33  regions  . 


Normal  surface.  2nd  approximation. 


X2. 

P. 

X2. 

P. 

20-8 

*59 

17-5 

*79 

27-8 

IO-I 

48-6 

•035 

27-6 

•59 

The  improvement  in  the  outlying  regions  by  the  use  of 
the  second  approximation  is  very  marked. 


F  F 
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APPENDIX. 


MA  THEM  A  TICAL  '  NOTES. 

i.— Wallis’s  Theorem  for  the  Value  of  it. 

By  simple  graphic  considerations  it  is  evident  that  when  n 
is  a  positive  integer 


i 


7 T  7T  7T 

2  f2  [2 

sin2n+i x.dx<  |  sin2nA;.^v<  I  sin 2n~1x.dx 
0  J  0  •'  0 


2.4.6...2W  ^1  .3 .5  .  . .  (2W— 1)  7T^2  .4 .6  . . .  (2W  —  2) 


3-5-7 •••(2w  +  i)  2.4.6...2«  *2  3.5.7...(2w  — 1)  ’ 
22n(nl)2  (2n) !  tt^22h(w!)2  1 


(2W  +  I)!  22n  (ft  !)2  2  (2»)  !  *2»’ 

22^  (n  !p  /  o2w/>m  l\2 


<  ,  7T^  22W(»  J)1 


(2w)  !  's/  2ft  +  I  N7  2  (2»)  S  a/ 2ft 

1  -  g*W  correct  to  I 

- -  .  / - ^  Wli  OvL  tv  •  1 

2  (2»)  !  V2»  « 


(132) 


(See  Gibson,  Treatise  on  the  Calculus,  1896,  Ex.  XXVI.  22.) 


2. — Sum  of  Powers  of  Integers . 

If  we  suppose 

t  =  m 

Sm  =  ~  ^wr+1  +  bmr  +  cw**”1  +  . . ., 

<  =  1 


we  can  find  a,  b,  c  ...  by  induction. 

For  (m  +  ip  =  Sm+1  —  Sm  =  a  {(m  +  ip+1  —  mr+1\ 

+  b{(m  +  ip  —  mr]  +  c{(m  +  ip'1  —  mv~1} 
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Equating  coefficients  of  mr,  mr  1,  . .  . ,  we  have 
i  =  a  {r  4- i) 

(r  4-  i)  r 


r  =  a 


-f  br,  and  b  —  \ 


r{r—i)  (r  +  i)r(r  —  i)  r  (r  -  i)  ,  v  ,  r 

ssa— - c -  +  o  — - - -  -f  c(r  —  i),  andc  =  —  etc. 

2  12 


2  6 
Ir  +  2r  -f  +  mr 


mr 


+i 


iif 

+—+—,+ 


r  -f  i  2  m  12  m1 


1  if  ~  is  neglected, 


y  -f-  i  m 

— ; - b  — ,  if  — -« is  neglected  ....  (133) 

r  + 1  2M  m 2  b 


3. — Stirling's  Formula  for  m  ! 


The  first  approximation  to  this  formula  may  be  obtained 
from  Wallis's  Theorem  as  follows. 


Write 
z  = 


(2  m) 


Then 


(2 m)m+1  (m  —  1)  ! 


t  =  m 


(2 m  —  1)  (2 m  —  2) . . .  (2 m—m)  ~  (2w)m. 


lo gz=^  log  (1 

i  =  1  ' 


2m  y 


,  i  +  2  -f  +  m  ,  , 

—  log^= - — - b---H - E7Z.T7.T-' - - b.. 


2W 


r(2my 


r=  oe 


=  2  (br  •  Gqr  + 4  +  ess)}* 

by  Note  2,  if  higher  powers  of  —  are  neglected. 


Now  S 


^  2"r 


22-f-2S  ,  T 

f  f  -b  1 


,-r-l 


r  (>•  + 1) 


=  -  log  (I  -  i)  +  2  {log  (1  -  i)  +  J}  =  I  -  log  2  =  log  (-). 


Hence 


—  log*  =  m  log  (£)  —  \  log  (1  —  £)  -f 


12  rn 


,  m 


12ni 


and  *  =  (-)  ,2~*.e  =  2m~Ke  m.  1 - b  •  •  • 

\e/  \  12  m 

__  .  ^_wl.  if  -  is  neglected. 

m 
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But  by  Wallis’s  theorem 


m !  = 


(2m) !  (2 m)$ 
22m .  m ! 


correct  to 


1 

m  ’ 


=  z  X 


(mttY  .  mm 
2m  ~ 1 


m !  =  rnm  . 


(134) 


This  formula  gives  an  error  of  less  than  1  per  cent,  for  the 
value  of  10  !  and  rapidly  reaches  considerable  accuracy  if  m 
is  increased. 

In  its  more  complete  form  it  is 

,  1  1 

, _  —  in  - —  - -  -f- 

m  !  =  Mm  .  V 27 rm  .  e  12w  360,n: 

(See  Chrystal’s  Algebra ,  Chap.  XXX.) 


4. — The  Euler-Maclaurin  Theorem ,  which  connects  Summation 

with  Integration. 

Let  f(a),  f(a  -\-h),  .  .  .  f  (a  +  mh)  be  values  of  f(x)  at  m  -f  1 
successive  values  of  x. 

Then  by  Taylor’s  expansion 

7,2 

f(a  +  h)  =f(a)  +  hf'(a)  +—/*(#)  +  ... 

h 2 

f(a  +  2 h)  =f(a  +  h)  +  hf'(a  +  h)  +  ~~  f"(a  +  h)  -j-  . .  . 


f(a  +  mh)  =f(a  +  m  —  1  h)  +hf'(a  +  m  —  1  h)  4 — f"(a -j-m—  1  h)  -f ... 

*2i 

Write  d  —  a  +  m  —  1  h,  and  b  =  a  mh,  and  add. 

m  -/(«)  =  h  2  f  w + .  2‘f'w  +  s"F'(*)  +  •  •  • . 

&  "  ci  J)  •  a. 

where  F (#)  =/'(#)  and  j  F  (x)  .dx=f  (x)  +  constant. 

h2  d  7;3  d 

•••  *2  FW  =  /  FW  •  <**-  -E  F'M  - -1  E  F"W  - •  •  • 

a  «  2  a  3  •  “a 

Similarly 

h  2<F'W  =  jbF\x)dx-~  2' F"W  -  •  •  • 

«  ^  ci  .  2 

and  h  2‘f"  (*)  =  / 4  F"  (*)  *  -  . . . 

a  J  a 


/ 
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Combining  these  equations,  we  have 
*  h 


4V  F(*)  -  r  F  (x)  .dx  —  ^{F{b)  —  F  (a)}  +  ~  {F'(&)  —  F'  (<*)} 

+  terms  involving  /^4, 


and 

b 


h  y '  F  (*)  =  1 “  F  (*) .  dx  +  4f  (b)  +  F  (a)}  +  ^  {F'  (6)  —  F'  (a) } 


-f-  terms  involving  A4 


•  (135) 


In  the  figure  let  OA  represent  a,  OB  b,  and  AA'  and  BB'  h. 

AB  =  mh. 

hF(a),  hF(b)  are  the  rectangular  areas  on  AA',  BB'.  h  2^  F  (x) 

a 

is  the  sum  of  the  rectangular  areas  on  AB. 

I  F  (x)  .  dx  is  the  curvilinear  area  on  AB,  and  the  term 

•  a 

-  [F  (a)  —  F  (ft)]  is  a  first  approximation  for  the  defect  of  the  curved 


from  the  rectilinear  area. 


Some  difficulties  arise  in  applying  this  theorem  to  the 
curve  of  error. 

i  — 

Here  F(#)  =  — -=  e  ‘2<r\  and,  when  x  is  not  great  compared 
<r  V  27 r 

with  o-,  should  be  represented  by  a  finite  vertical  length. 


i 

a 


— >  and  should  be  finite  vertically. 

's/fiqn  V  n 
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The  horizontal  distance  AB  —  mh  should  be  finite.  It  is 
found  in  the  analysis  on  pp.  265-8  that  the  number  of  successes 
above  or  below  pn  must  be  considered  as  of  order  y/n  ;  hence 

m  is  of  order  Vn  and  h  (the  unit  step)  is  of  order  -^7^.  In 

other  words,  in  the  drawing  the  rectangles  must  be  supposed 
so  thin  that  it  takes  a  number  of  them  comparable  with  Vw 
to  give  a  finite  breadth.  - 

1  & 

In  the  equation  (135)  then  h  is  of  order  - F(v)  contains 

_  b 

y/n  terms  each  finite,  and  therefore  h  2  F(#)  is  of  order  F(x), 


as  is 


F(#)  .  dx. 

a 


The  following  terms  on  the  right-hand  side  are  successively 
of  orders  -^=,  ?  etc.  (F ;(b)  is  of  course  a  simple  numerical 

V  fl  M 

ratio.) 

Now  give  h  its  value  unity,  and  we  have,  for  the  normal 

curve  of  error  in  which  terms  of  order  -i-=  are  neglected, 

vn 


aggregate  chance  of  successes  from 


pn  +  x±  to  pn  - h  x2  = 
In  the  next  approximation 


'*2 


F(#)  .  dx 


(136) 


p. 


P0 .  c 


*2 


KfX  X3  \  | 

1  2\(7  3  a3))’ 


where  terms  in  — are  retained  and  terms  in  -  neglected.  The 

Vn  n 

h  term  in  the  formula  (135)  must  be  retained. 

The  result  is  most  conveniently  given  as  the  sum  of  the 

chances  of  successes  from  pn  to  pn  -j-  x ;  for  this  purpose 

suppose  A  in  the  figure  to  be  a  half-unit  to  left  of  G,  where 

OG  =  pn  and  G  is  the  abscissa  of  the  centre  of  gravity  of  the 

curve  (p.  437).  Then  let  GB  =  #. 

Sum  of  chances  from  G  to  B  =  sum  from  A  to  B  —  \  .  P0 

P,.^  +  i(Px+P0)-JP0. 


Write  x  —  z<t. 
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Hence  the  sum  of  chances  from 

i 


0  to  Zcr 


V  27 r- 


f2 

fi-q 

1 

CO 

H  1 

1 

V 

J  0 

L  2' 

v  3  /j 

dz  -f 


2a  V 2 


-W 


7 r 


(137) 


when  —g  is  neglected. 


Here  a  =  V pqn,  k  = 

Vpqn 

(See  Todhunter’s  History  of  the  Theory  of  Probability , 
Art.  993.) 


5. — Dr.  Sheppard’s  Corrections  for  the  Moments  of  Frequency 

Curves. 

(See  Biometrika,  Vol.  III.,  pp.  308  seq.) 

Let  y—  f(x)  be  the  equation  of  a  continuous  curve  of  frequency, 
whose  area  is  unity. 

h 

Let  A p  be  the  area  standing  on  the  base  xp  ±  — ,  p  being  integral, 

and  let  the  values  of  Ap  for  all  values  of  p  be  known  from  the 
observations. 

The  2th  moment  computed  from  the  equation  of  the  curve, 

fb 

say  mt,  —  I  xl  .f(x)  .  dx,  where  a  and  b  are  the  extreme  values  of  x. 

■  a 

The  tth  moment  computed  from  the  observations,  when  each 
area  is  taken  as  concentrated  at  the  middle  point  of  its  grade, 

b 

say  nt>  =  2  •  V  •  ap- 

a 

Required  to  find  what  correction  should  be  made  to  /xt  to 
obtain  mt. 

_  ,h  h 

Ap  =  /  “  f(x)  .  dx  =  I  hf(xp  +  x)  .  dx 

■  x»-2  1  “2 

*  2 
=  1 2 ,,{/(%)  +  */'  (xp)  +  ^|/2(%)  +  •••}•  dx 
2 

h3  h 5 

=  VW  +  ^  P M  +  ~  PM +■■■■ 

Hence 

b  b  h  3  b  1 5 

^ = 2  hxp'fixp) +2  „  v/*  (**) + 2  +  ■  •  • 

a  0-^4  a 

=  by  the  Euler-Maclaurin  theorem  (formula  (135)) 
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a 


L  h2  b  /j4  b 

xtf(x).dx  +  ^{bm+aif(a)}  +  ~^(x‘f(x))]-—mxtf(x))] 


a 


+ %  J/f 2  (x)  -dx+% {itf2{b) + atp  (a)]+m^ (xtp  {x))\ 


+ 


A4 

1920 


(%)  .  dx  +  terms  involving  h 5. 


a 


Now  restrict  the  investigation  to  the  case  where  the  curve  is 
zero  and  touches  the  axis  at  both  extremities,  so  that 

m=o=m=f  (a)=f  (b), 

and  let  the  contact  be  so  close  that  also 

h2 p  (*)  =  p  (j)  =  0,  and  aiso  A4/3  (a)  =  A4/3  (6)  =  o, 

and  in  all  these  cases  let  the  presence  of  a  multiplier  such  as  a1,  bl 
not  make  any  significant  difference  from  zero. 

The  expression  reduces  to 

fb  h2  fb  h*  fb 

/  x‘/(x).dx  +  —f  xtf*(x)dx  +  — —  J  xtfi(x)dx 


J  a 


Then 
"6 


24  J  a  “  '  '  '  i920 

+  terms  involving  ii 5. 


j  xtf2(x)dx  =  [xlf'(x)]  —  t  I  xt~1f'(x)dx  =  t(t  —  i)mt_2, 

•  a  a  ■  a 

fb 

and  I  xtfi{x)dx  =  t  (t  —  1)  (t  —  2)  (t  —  3)  mt_ 4,  by  continual  inte- 

'  a 

gration  by  parts  and  use  of  the  conditions. 

Since  h  is  generally  small  in  comparison  with  the  moments, 
terms  involving  A5  can  be  neglected. 

Ii 2  w 

•••  H'  =  nh+  —  t(t-  1)  +  — -t  (t  2)  (i  -  3)  m,._t 

24  I92O 

approximately. 

Giving  t  the  values  o,  1,  2,  3,  4  in  succession,  we  have 
fx0  —  m0  —  area  of  curve  =  1. 

fti  =  =  zero  if  the  equation  is  referred  to  the  vertical  through 


fl2  —  ^2  H"* 


h2 

12 


the  average. 
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h 2 

m2  —  /x2 


(138) 


h 2  ,  7A4 

=  U4 - Mo  I - 

4  4  2  2  240 


. (139) 


and  =  fxv  m3  =  /x3  when  the  moments  are  taken  about  the 
vertical  through  the  average. 


6.' — The  Moments  and  Constants  of  the  Second  Approximation 

to  the  Generalised  Curve  of  Error. 

The  equation  to  the  curve  is 


y 


Write  mp  for 


X2 

I  I  t^fX  IX 

sV2^e  lI-2Vs-3-^ 


/ 


X2 


e  2s2 .  x^dx. 


-  00  sV 27 r 

Then  mQ—  1,  ml  —  mz  =  ...  =  m2p+1  —  ...  =  0,  m2  —  s2, 
m2 P  =  1.3.5...  ( zp  —  1)  •  s2p  (formula  (23)). 
Write  Mp  for  the  pth  moment  of  the  second  approximation. 


Then 

M ,-f 

J  — 


X 3 

e  2s2 .  x^dx — —  | 
2. s'  J 


60  S'*/ 2tt 


2SJ  -  ooS\/ 27 r 


a;2 


a;^ 


+  7-3/  — 7=  5  2s2.Xl)  +  Sdx 

6S3./  -  00  s  v  27r 


K 


K 


=  m 


p 


2s”lp+ 1+  6$3-mP  +  3- 

.*.  M2p  =  ^22?  >  since  m.^p-^-i  —  o  —  m2p-\-z, 

and  therefore  even  moments  are  not  affected  by  the  inclusion  of 
the  k  term. 

M2  =  s2 . (140) 

M2P+1  =  m2p+2  •  m2P+ 4,  since  ni2p+ 1  =  o 


M2  =  — —  (m2  — m  )  —  ~  P-(s2 
1  2S  \  2  3s2  V  2S\ 


3S 


2  * 


3s 


o 


Me 


K  / 


2S  V 


(3S4  —  —j ..I5S6)  ==  K.S3  ......  (I4I) 


3$ 
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M, 


225+1 


=  “  is ■ 1 '3 '5  "  '  ^  + 1')‘  s2p+2 11  “  57» (2^+3)*®] 

=  “  •  I-3-3  •  •  •  (2/>  +  i)  .  s2^"2  .M3 . (142) 

The  origin  is  the  average  of  the  curve,  since  Mx  =  o. 

To  find  the  mode  we  must  equate  —  to  zero. 

dx 


x i 


l°g^sV2^J=  2  +  l0g{l-^g-~ )} 


xu 

2S2 


/<  / X  X* 

2  Vs  3SJ 


since  k2  is  of  the  order  -  and  neglected  in  the  analysis  of  p.  295. 

n 


1  dy  x  k  / 

0  =  -  ./  = - 5 - ( 1 

y  dx  s-  2s\ 


xi 


,2  > 


•  •  •  (i43) 


whence  x  —  —  Jks,  neglecting  k3. 

distance  that  average  is  to  the  right  of  mode 


=  j*  •  (144) 


Then  area  of  the  curve  standing  on  the  base  ON,  where 


ON  —  x  — 

zs,  is  given  by 

Y  =  -4= ) 

0  V27 r-1 

=  F(*)  - 

*  It 
6V 27 r  ' 

=  F  (*)- 

-  «/(*). 

where 

e  ~  ^  dz 

0 

and 

{i  -  (1  - 

~(z-ZA\dz 


These  functions  are  tabulated  on  p.  271  and  p.  303. 

Y°  =  F  (*)+*/(*) 


-2 


and  the  whole  chance  from  —  z  to  +  z,  Y  ,  is  2F  (2),  as  in  the 
normal  curve.* 


-3 


*  The  corresponding  formula  from  the  p,  q,  n  hypothesis,  using  the  Euler- 

e-k'x 

Maclaurin  theorem,  is  2F(z)  - — but  when  the  data  are  continuous 

SV  27 r 


the  last  term  drops  out. 


of 

o 

or 

DC 

UJ 

u. 

O 

UJ 

> 

DC 

D 

O 

UJ 

X 

CD 

UJ 

X 

h 

U. 

o 

UJ 

. J 
0. 

2 

< 

X 

UJ 


Cj- 

o 


Mi  fcJ 

o  3 

<  < 


X 


2 

w 

<x 

o 

►J 

►—4 
V-— < 

U 


o 

2 


>  X 

^  o 

X  cX 

~  o 

u 
u 
< 


2 

o 

o 

sc 

o 

co 


J 

o 
o 
X 

0  ^ 

CO  Ml 


O 


U* 

o 

o 

2 

fr-4 

Af 

D 

O 


D 

PU 

CO 

D 

O 

CO 


ex 

O 
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If  M  is  the  position  of  the  median  we  have 

o  , 

\  -f  area  on  MO  —  Y  ,  and  \  —  area  on  MO  =  Y 


.\  2  area  on  MO  =  Y  —  Y 


«{/(—  <»)+/(<»)} 


K 


-  oo 


3^277 


The  ordinates  on  the  small  base  OM  differ  from  the  ordinate  at 
0,  viz.,  — J=)  only  by  terms  involving  k. 


S'S/  2tt 

Ik 

2MO  X  — 7 —  =  — =,  when  k2  is  neglected, 

SV2TT  3V2t r 

and  MO  =  Jks  =  J  (distance  from  mode  to  average) 


•  (145) 


Let  the  area  on  MN,  where  M  is  the  median  and  N  any  point, 

equal  the  area  of  the  normal  curve  on  0N1?  i.e.,  where 

xx  —  0NX,  and  let  NNX  =  v,  where  v,  as  can  be  seen  in  the  follow¬ 
ing  analysis,  is  small  and  of  the  order  k. 

ON  =  x1  —  v. 

Then 

F  =  area  on  MO  -f-  area  on  (xx  —  v) 


K  +  1 


X\  -v  - 


2s2 


6a/2t r  sV 27T-' 0 


dx 


6V27 r  l 


=  — K_  -|-F(  — )—  _ 

6\/ 27 r  \  s  /  sV 27 r 


V 


e  2s2  — 


K  ^  K 

6V  2TT  6  a/  27T 


X, 


/ 


where  terms  of  order  v2,  and  v«  are  neglected. 


v  = 


kS 


1  — 


Xi 


The  average,  s,  and  k  can  be  obtained  if  we  know  the 
relative  number  of  observations  from  the  lowest  to  each  of 
three  positions  on  the  horizontal  scale,  and  if  we  can  assume 
the  equation  of  the  frequency  curve  is  that  here  in  question. 

The  method  is  most  readily  explained  if  we  take  a  numerical 
example.  On  p.  309  we  have 
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Limits  of  Age. 
o  to  13  years 
o  to  15  „ 

o  to  16 


Number  of  Children. 

•296  of  3044 
•867  „  „ 

*969  »  >  y 


Let  m  be  the  median  age,  s  years  the  standard  deviation,  and 
k$3  =  third  moment,  all  unknown. 

In  the  figure  above  let  M  represent  the  median  age  and  N  the 
age  15  years. 

The  area  on  MN  is  then  -867  —  -500  =  F (1*112),  (p.  271). 

Hence 


ONj 


xx 

1*112  =  —  =  Z-, 
S 


15  —  m  =  MN  =  MO  +  ONj  —  NNj  =  +  xx  — 


15  —  m  —  zxs  -j-  J kszx 2  where  zx  =  1*112. 


Similarly 

16  —m  —  z2s  +  \ksz^,  where  z2  =  i*866, 
and 


m  —  13  =  ^3s  —  JkS232,  where  F  (zz)  —  *204  and  z3  —  *536. 

A  little  consideration  will  show  that  the  negative  sign 
must  be  taken  when  N  is  to  the  left  of  M. 

We  have  now  three  equations  for  determining  m,  s,  and  k. 


Jks  Oq-Za)  +  s  =  — 'Z~— 

H  +  z3 

$kS  {z2-Z3)  +  S  =  —J— 

"2  1  z3 

ks  =  *2  78  s  =  1*187  K  —  *234  m  —  i3‘623 
Average  =  m  +  Jks  =  13*669. 

(Compare  Statistical  Journal,  1902,  pp.  339  to  348.) 

From  moments  depending  on  the  whole  nine  grades,  it  was 
found  that  s  =  1*190,  k  —  *206,  and  average  =  13*665. 

If  the  average  or  median  is  known,  or  if  the  curve  is  known 
to  be  normal  and  k  —  o,  two  observations  are  sufficient  for 
determining  the  remaining  quantities. 


446 


ELEMENTS  OF  STATISTICS 


Notice  that  the  curves  representing  the  first  and  second 
approximations  intersect  at  x  =  s\/3- 

The  area  of  the  skew  curve  standing  on  MN  =  the  area  of 
the  normal  curve  standing  on  ON  when  x  =  ±  s. 

The  excess  of  the  skew  curve  over  the  normal  curve  for 
any  distance  ON  to  the  right  =  the  defect  for  the  same  distance 
to  the  left. 

7. — Ratio  of  Unweighted  Averages. 

Let  Mx,  M2 .  .  .  Mn  be  the  true  measurements  of  n  quantities  at 
one  time,  and  M/,  M2'  ...  of  similar  quantities  at  another  time. 

Let  nm  —  SM*,  M t  =  m-\-  mt>  S mt  =  o,  rim'  —  SM/,  M('  =  m’  +  mi , 
Smi  =  0,  narm2  =  S mi,  no-m,2  —  Sm't2. 

Let  Wi  =  m  ( 1  +  p),  M*'  =  (1  -f  a  -f  ut)  Mt  where  S ut  =  0,  and 

_  ,  S mtut 


Here  u  measures  the  mean  of  the  ratio  increases  of  the 
quantities,  and  p  measures  the  ratio  increase  in  the  mean  of 
the  quantities.  These  tend  to  be  equal,  if  the  larger 
quantities  are  not  on  the  whole  subject  to  the  larger 
increases,  or  conversely. 

Suppose  the  quantities  to  be  erroneously  measured  as 
M;(i  -\-et)  and  M/(i  -f  et')  etc.  Then  by  formula  (70)  the  standard 


deviations  of  the  errors  in  m  and  m'  are  I  (  j  \  an(q 


Vn 


ml 


where  a  and  <r1  are  typical  of  et  and  etr. 

If  the  errors  in  the  two  sets  of  measurements  are  inde¬ 
pendent  of  each  other,  then  (by  p.  318,  formula  (63)), 


qw2\1 


1)1 

where  sr  is  the  standard  deviation  of  errors  in  — ,  i.e.  in  1  4-  p. 

m  r 

It  is  frequently  the  case,  however,  that  the  error  e/  in 
the  measurement  of  M'*  is  of  the  same  sign  and  not  far  from 
the  same  magnitude  as  et,  the  error  in  the  earlier  measurement 
of  the  corresponding  M<. 

Write  dt  =  et  —  et. 
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Then  if  e  is  the  resulting  error  in  the  ratio  of  the  averages 

m!  S{Mt'(i  -j-  et)} 

m  y  1  '  S{Mt(i  +  et)} 

S{M /  (i  +  e't)} .  SMt  -  S{Mt(i  +  et)} .  SM't 
e~  S{Mt(i  +  «)}.SMi' 

_  mS(Wet')  —  th' .  S(M tet) 
w'S{Mt(i  +  et)} 

mS(Mtdt)  +  S{(wMt'  —  m'Wi)et} 
timin' 

neglecting  et 2  and  etet , 

S  S {(ut  u  —  p)  M tet}  __  S  (M tdt)  SM met 

nm'  nm'  nm'  nm' 

if  u—  pis  neglected. 

Hence  if  sr  is  the  standard  deviation  of  e,  and  crd,  cr  the 
standard  deviations  of  dt  and  et,  or  their  weighted  standard 
deviations  if  they  are  not  all  from  identical  frequency  curves, 

5, 2  =  -^=>2  <rd2  .  S  (Mt'2)  4-  ^  .  cr2 .  S(M t%t2),  by  formula  (55). 
n2m  n2m 2 

Now 


S  (M t'f  =  S  (m'  +  mt'f  =  n  {m’2  4-  o-m,2), 
and  S(Mt2«t2)  =  nm2o-u2  4-  no-m2<ru 2  4-  S 11?  {mt2  —  crm2)  4-  2mS  (mm2), 
where  na-a2  =  S  ut2, 


=  no- a2  m 2  4-  cria 2)  4-  terms  which  tend  to  be  negligible. 


V 


1 

n 


~  0‘t?2(I  +  W2)  +  ~  •  °"2  •  ( 1  + 


I 

n 


(T 


m 


.  <rM‘ 


'  (1 + p)2 


approx . (146) 


If  ef  and  et  were  independent  <rd2  would  equal  <j2  4~  °TD 
while  if  et  =  et'  etc.  crd  would  be  zero.  Hence  <rd  may  be 
regarded  as  between  0  and  0-V2. 

The  magnitude  of  the  second  term  depends  on  au,  which 
measures  the  variation  in  the  rates  of  increase  of  the  different 
quantities,  and  is  known  from  the  observations. 

Hence  if  similar  errors  are  made  in  observations  at  both 
dates  of  quantities  which  increase  at  nearly  the  same  rates, 
the  error  in  the  ratio  of  the  computed  averages  is  small,  and, 
if  n  is  great,  very  small. 
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8. — Ratio  of  Weighted  Averages. 


In  the  case  of  weighted  averages  the  formula  becomes  more 
complex. 

Let  Wt  —  w  -f  wt,  and  W t  =w'-\-  wt  be  any  pair  of  weights  at 
the  two  dates,  where  w,  w'  are  the  averages  of  the  weights.  Write 
ncrw 2  =  S wt2  and  wo-r2  =  S wt2. 

Let  W t  =  Wt  (i  +  v  +  vt),  where  S vt  =  o,  and  write  nav2  =  Svt2. 

Suppose  Wf(i  +  r]t)  and  Wt  (i  +  vt)  to  be  taken  in  error  for 
Wt,  Wt,  and  write  cr'  for  the  standard  deviation  of  r]t. 


Let 


ntw  = 


S  f WtMt) 

SW t 


and  mw  = 


S(Wt'Mt') 

SW t' 


Other  letters  have  the  same  meaning  as  in  the  previous  note. 
Required  the  error  in  say  e. 

fflw 


W,  S{Wi'(i4-V)M('(i  +  «')1  S{Wi(i  4-  >;*)} 

nk,{I  +  e>  S{Wi(i  +i/()Mi(i  4-  et))  '  S{W7(i  +  vOV 

and  hence,  after  reduction  in  which  products  and  squares  of  rjt,  et 
are  neglected, 

e  _  S(Wt'MiV)  _  S{WtMtet)  S{ W t[mw  -  M ijrjt]  SjWt'jmw'-Mt'Wt} 
nw'rriw  nwihw  nwmw  riw'm'w 

. (147) 

To  obtain  approximate  results  neglect  all  sums  of  products, 
where  the  sum  of  the  factors  of  one  kind  is  zero.  This  leads  to 
taking  mw  —  m,  rhw  —  in',  u  =  p,  w'  —  (i  +  v)w,  and  to  further 
simplifications  in  the  reduction. 

Write  d\  =  rjt  —  y]t  and  erf  for  its  standard  deviation. 


_ S  (WtMtdt)  ,  S(WtMtut)  _  ,  S(WMm)  , 

i  liGri  &  — » — /  1  — # — /  r  — # — . 

nw  m  nw  m  nwm 

.  S(W t'mt'di!)  ,  SWt'Mtui)  ,  S  (Wtmtvt) 

H - ~ - - =7=7 —  vt  H - — , — —  vt- 

nw  m  nw  m  nw  m 


Hence  approximately 
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The  terms  involving  a2,  ad 2  (which  measure  the  errors  in 
the  quantities  M*)  are  similar  to  those  when  the  average  is 
unweighted,  except  for  a  factor  (greater  than  i  and  generally 
less  than  2)  involving  weights,  and  a  term  involving  also  the 
small  factor  av2  which  measures  the  variation  in  the  change  of 
weights. 

Of  the  three  terms  involving  a-'2,  ad'2  (which  measure  the 
errors  in  weights)  the  first  and  the  third  contain  the  factors 


small  when  the  rates  of  increase  of  the  quantities  are  nearly 
equal. 

The  actual  values  of  all  the  coefficients  of  a,  a',  <rd,  crd  can 
be  obtained  from  the  observations,  and  their  relative  importance 
discovered  ;  but  we  can  say  without  evaluation  that  when 
quantities  little  dispersed  increase  at  rates  not  far  from  equal, 
errors  in  weights  have  little  importance  as  compared  with 
equal  errors  in  quantities. 

In  such  cases  a  first  approximation  would  be 


but  if  is  not  small,  a  better  approximation  would  be 


1  +  ii 


(150) 


It  is  seldom  that  ad>  which  measures  the  difference  of  errors, 
is  small  compared  with  one  error,  though  it  is  likely  to  be  less 
than  V2  .a\ 

It  is  advisable  to  test  the  coefficients  roughly  from  the 
observations  before  neglecting  terms  ;  and  also  where  there 
are  any  signs  that  the  neglected  products  are  not  small,  or 
any  of  the  errors  are  likely  to  be  specially  large,  the  unabridged 
form  (147)  should  be  used. 

(See  Statistical  Journal,  1911-12,  pp.  81-88,  “  Measure¬ 
ment  of  the  Accuracy  of  an  Average/’) 
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9. — Normality  of  Standard  Deviations  of  the  Errors  in  Moments, 

etc. 


[Based  on  Sheppard's  “  Application  of  the  Theory  of 
Error,”  Transactions  of  the  Royal  Society,  Vol.  192,  1898, 
A.  229,  pp.  117-128,  but  with  modifications  in  notation  and 
treatment.] 

In  a  universe  containing  N  things  px N  are  at  xv  p2 N  at  x2 .  .  . 
Pi  ~b  P2  +  •  •  •  =  x>  F  =  axpx  +  a2p2  +  . .  .  where  av  a2 .  .  .  are 
constants. 

In  a  selection  of  n  things,  nx  are  found  at  xv  n2  at  x2  .  .  .  , 
nx~F  n2~F  . . .  —  n. 


H  77 

Write  F  +  /  =  a,  -1  -f  a2  -2  + . . . 

n  n 


r  1  ^2  1  ( n-t  .  n2  \  ,  n <  .  n2 

J  1  n  2  n  \n  n  )  1  n  2  n 


where  bx  =  a1  —  F  etc. 

Then  S btpt  =  S atpt  —  F .  S pt  =  F  —  F  =  o 

S bt2pt  =  Sat2pt  —  2FS atpt  +  F2 .  S pt  =  S at2pt  —  F2. 

Required  to  find  Ms,  =  mean  fs,  and  to  show  that  its  relation  to 
M2,  =  mean  /2,  is  that  found  in  the  normal  curve  of  error. 

The  expression 

E  =  (pj1  »  +  pj'z  ’*  +  . .  f, 


expanded  by  the  multinomial  theorem,  gives  the  sum  of  terms 


n ! 


pT  n 


)ni(p/z » 


no 


n± !  n2 !  . 

subject  to  the  condition  nx  +  n2  +  .  •  •  =  n 

=  sum  of  terms  P  .  e  fa , 


where 


and 


P  = 


n  ! 


nl/h  n2 


7t1  !  lie 


PllP 


and  is  the  whole  chance  that  the  selection  nx  at  xv  n2  at  x2 .  .  . 
should  be  made,  as  may  be  seen  by  expanding  the  multinomial 


( Pl  +  p2+---)n 
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E  =  sum  of  P  ( 1  +  «/  +  ~/2  + 


a 


+  „,/*+•• 


S  ! 


.n 


—  M0  4~  «Mj  4 - M2  — .  Ms 

Also  E  =  (spt  + 1  Sbtpt  +  £  SW*  +  £3  C3  +  ^  Q  +  .  7 

h  a 

from  the  expansion  of  the  terms  e  n ,  where 

C3  =  S bfipt,  C4  =  S bftpt .  .  . ,  and  Spt  —  1,  Sbtpt  =  0. 

E  =  (1  +  Sbt2pt  +  .  .  ,)n. 

2  nz 

Equating  the  first  three  coefficients  in  the  two  expressions  for  E. 


Mf 


1,  Mj  =  0,  M2  =  —  Sbt2pt. 


n 


,,2  ns  /  n2  3  \  n 

i  +  -  Mjj  +•••+—,  M«  +...  =  (  i  -f  —  M2  +  ^C3  +  ...)  • 

2  *  s !  V  2n  2  6w3  3  / 


c  c  c 

Now  when  n  is  large,  and  M2,  -f ,  — — | 


. . .  are  finite,  we  have, 


if  we  neglect 


V  n 


cr 


a* 


i  4 —  M2  4~  •  •  •  4~  — i  Ms  4-  ...  —  ( 1  4~  —  Mo )  =  6  2 
2  s !  V  2  n  * 


n 


Mo 


ct4  a ^ 

i+  — Ma+...+ 


t\  2<V'!l 


4*  •  •  • 


MJ 

t !  2f 


Mo) , as 


Hence,  in  this  case,  Ms  =  0  if  s  is  odd,  and  M2«  = 

in  the  normal  curve  of  error. 

C 

The  conditions  that  — f-  etc.  are  finite  are  similar  to  those 

in  the  Edgeworthian  analysis  on  pp.  295  seq.,  but  need  con¬ 
sideration  for  each  case  to  which  the  theorem  is  applied. 

71 

Thus  on  pp.  419-20  /  =  x12e1  -\-x22e2  4-.  .  where  e1  is  —  —p±, 

F  =  fju2  the  second  moment  of  the  universe  from  which  the 
selection  is  made,  and  bt  —  x?  —  fi2. 

M2  =  \  S('^2  —  i^2 )2pt  =  ~  (/*4  —  ^22) 

rl  11/ 

c3  =  S  (xt2  —  H2)Spt  =  (h  —  3/^4/x2  +  2/x23) 

C3  *  /*6  ~  3/*4^2  +  2/^23 

4  —  M2  . 


W1 


(H  —  /A22):': 


G  G*  2 
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Similarly 


y2  t*s  —  4W2  +  —  3lH 


n‘ 


K  —  M22)2 


etc. 


Now  if  the  ratios  — ®,  .  .  .  are  finite,  where  a2  =  u2,  we 

<T4  <T6  (X 8 

have  that  ^1,  -4  .  .  .  are  finite,  as  was  required, 
w*  w2 

Hence  if  the  curve  of  frequency  of  the  universe  satisfies 
these  conditions,  which  correspond  in  fact  to  a  reasonable 
concentration  about  the  average,  with  no  groups  of  importance 
beyond  a  small  multiple  of  a,  the  curve  of  frequency  for  the 
errors  of  the  second  moment  (and  of  the  standard  deviation) 
are  normal. 

A  similar  but  simpler  analysis  shows  the  errors  of  the 
average  have  normal  frequency  (/  =  x1e1  etc.  F  =  0,  b(  =  xt). 

In  the  case  of  the  analysis  of  the  correlation  coefficient 
(p.  422) 


fxtyt 
\  M  2/\ 


xt~ 


M,  =  -  Sr* 
2  n 


yr 

2/x 


—  i  +  2  4*  and  pt  —  zt. 


xt 2  yt2\ 2 

2\  2/x/  Zt 


ry  /M22  1  A4  j  JH  ^31  M-13  i  ^22^ 

n  \M2  4X2  4/ x 2  MX  M/x  4X/X/ 


C,  -  Sr3 


'xt yt  xt 2  yt2\3  _  3 
J  “  2X  2/x/  Zt  ~  r 


^33  1  A6  I 
v  M3  8X3  ^  ’ 


Writing  X  =  erf,  /x  =  tr22,  we  have,  if  — - ,  ,  — —  are  finite  for 

OV  O'?1 


c. 


crxscrf 


all  values  of  s  and  t,  then  — h  =  M2:j  X  finite  quantity,  and  higher 


terms  can  be  similarly  dealt  with  as  before. 

Hence  if  the  moments  and  products  of  the  two-dimensional 
frequency  distributions  satisfy  the  conditions  already  described,  the 
error  curve  of  the  correlation  coefficient  is  normal. 


10. — The  Method  of  Least  Squares. 

This  is  a  method  that  has  for  a  long  time  been  used  for 
assigning  the  values  to  be  taken  when  there  are  a  number  of 
inexact  measurements  at  choice. 

Suppose  a  quantity  z  to  be  related  to  k  unknown  constants 
xv  %l.  .  .  Xk  by  the  equation  z  =  uxxx  +  u2x.2  where 
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uv  it2  .  .  .  are  quantities  that  can  be  observed ;  and  let  n  sets 
of  observations  be  made  giving 

\Z  =  -f*  d~  •  •  •  d-  i MkXk 

mZ  —  nu±xi  T  71^2% 2  d~  •  •  •  d“  n'MkXk 

where  the  z  s  and  u’s  are  known. 

If  n  —  k,  the  x’s  can  be  exactly  determined.  If  n  <  k,  there 
are  an  infinite  number  of  solutions  and  the  equations  are 
indeterminate. 

If  n  >  k  the  equations  are  in  general  inconsistent,  and  the 
problem  is  to  assign  values  to  xlt  x2 .  .  .  which  minimise  the 
inconsistency,  which  is  assumed  to  be  due  to  imperfect  measure¬ 
ments  of  the  us. 

Write  dv  d2, . . .  for  the  differences  between  xz,  2z . . .  and  the 
values  obtained  from  true  values  of  xlt  x2 . . .,  say  X1(  X2 . . . 

I  hen  d-  d~  •  •  •  d-  ± ^&X^  \Z  ==  d\ 

2«lXx  -f-  2W2^2  d~  •  •  •  d~  2W*Xa  — -  2-  =  d2 


It  is  assumed  that  dv  d2  .  .  .  are  errors  whose  chances  are 

da 

given  by  a  normal  curve  P  =  — ==-  e  2ff\  The  assumption 

a  V  27T 


is  generally  based  on  demonstrations  that  under  certain 
hypotheses  as  to  the  nature  of  accidental  errors  this  normal 
form  is  obtained.  Whatever  may  be  the  validity  of  these 
hypotheses  in  physical  or  geodetical  measurements,  it  is  not 
safe  to  assume  that  they  apply  to  statistical  or  biometric 
measurements,  whether  of  deviations  from  an  average  or 
errors  due  to  sampling. 

The  solution  is  obtained  by  finding  those  values  of  Xx,  X2 .  .  . 
which  make  the  probability  that  dv  d2  .  .  .  would  occur  together 
a  maximum,  that  is  which  make  the  sum  of  d^  +  d22  +  .  .  .  a 
minimum.  Write  f(dv  d2  .  .  .)  for  this  sum. 


The  conditions  for  a  minimum  are  =  o 


3/ _ V  - 


These  give 


ax. 


1  df  t=z11 

—  .  rv^r-  —  S  tU\  (tU^  T  2X2  d~  •  • 

2  OAj  t=l 

1  df  t=n 

-  .  —  S  tU2  (tU1X2  d~  eW2X2  T  .  . 

2  0  a2  t=l 


ax2 ' 

tz)  =  o 
tz)  —  o 


•  •  9 
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which  may  be  written 

Xj .  S u-f  +  X2 .  S WjWg  +  . . .  =  S uxz 
XxS uxu^  +  X2 .  Suf  +  . . .  =  S u2z 

XxS  UjUk  +  X2S^2  ujc  +...==  S UkZ, 

k  equations  giving  the  k  quantities  Xv  X2 .  .  . 

This  method  is  found  in  practice  to  give  quite  generally 
good  empirical  values  of  the  unknown  quantities.  It  is  used 
above  on  pp.  239,  240  in  its  simple  form,  and  a  corresponding 
method  where  the  validity  can  be  tested  is  used  on  pp.  364-5. 

(See  Merriman,  Method  of  Least  Squares ,  and  Weld,  Theory 
of  Errors  and  Least  Squares.) 


The  thick  line  A0  Ax  A2  A3  As  represents  the  normal 
curve  of  error. 

Cx  C2  C3  is  a  curve  of  error  with  the  same  unit  of 
abscissa;  as  Ax  A2  As,  but  with  ordinates  diminished 
in  the  ratio  4  to  I. 

Bx  B2  B3  is  a  curve  of  error  with  both  ordinates  and 
abscissse  half  those  of  Ax  A2  A3. 

The  areas  of  Bx  B2  B3  and  C2  C3  are  equal  ;  but 
the  standard  deviation  of  the  former  is  half  that  of  the 
latter,  and  it  represents  observations  of  twice  the 
precision. 

The  area  contained  between  the  vertical  lines  through 
Pj,  P2  and  the  curve  Ax  A2  A3  and  X  O  Xx  is  half  the 
area  between  the  curve  and  X  O  Xx ;  similarly  for 
Cj  C,  C3 ;  similarly  with  lines  through  pv  p2  for 
Bx  B2  B3. 

Pj,  P2,  pv  p2,  are  positions  of  probable  errors. 

Mla  M2,  mv  ni2,  are  positions  of  moduli. 

Ej,  E2,  ex,  e2l  are  positions  of  the  mean  errors. 

Sj,  S2,  are  positions  of  standard  deviations,  and 
Sx,  S2,  S3,  S4,  are  points  of  inflexion. 

A,  Fx  F2  represents  half  the  binomial  (£  +  ^)4 

A“2  Gx  g2  g3  G4  Gs . (i  +  i)10 

a2hxh2  .  .  .  h9 . (i  +  i)32 

These  are  constructed  on  such  a  scale  as  to  approxi¬ 
mate  to  the  curve  Ax  A,  A3.  Parts  of  the  32  curve 
cannot  be  distinguished  from  the  curve  of  error. 


- 

At  End, 


X 


CORRIGENDA 


Page 

>  9 

> ) 

9  9 

y  9 

9  9 

9  9 

9  9 

y  y 

9  9 

9  9 
9  9 
9  9 
9  9 

9  9 


9  9 
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9  9 

9  9 
9  9 

9  9 
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9  9 

9  9 

9  9 
9  9 


36,  line  2  from  foot,  delete  £.  ■ 

37,  11.  12,  13,  interchange  broken  time  and  overtime,  and  for  above 

read  below. 

108,  1.  13,  for  ax  -f-  a2  etc.  read  ax  x  a2  etc.  * 

112,  1.  11  from  foot,  for  ar2  read  a2. 

1 15,  1.  7,  for  dn- 2  read  dn- 1,  and  for  dn- 3  read  dn- 2.,/ 

123,  Table  II,  last  col.,  read  v.  p.  116. 

162,  Diagram,  fig.  4,  for  1885  read  1855. 

182,  last  six  lines,  change  signs  throughout.^, 

186,  middle  of  page,  for  read  / 

S  m  b  m  ' 

188,  1.  6  from  foot  and  last  line,  and  p.  189,  1.  3,  for  —  -0008, 
•0135,  *153  wad  +  *0008,  +  -0151,  -0153.  ^ 

209,  Table,  for  Quality  read  Quantity  (twice). n/ 

223,  1.  1,  read  yr  +  t-2  in  third  term.  v''" 

224,  note,  in  equation  (1 ),  for  {n  +  1)  read  (3 n  +  1). 

227,  1.  23,  insert  A'in  last  term. 

252,  1.  6,  and  p.  255,  middle,  the  meaning  of  k>  or  <3  is  incor¬ 
rectly  expressed.  1 


X  ^ 

267,  equation  (19),  the  second  index  should  be  —  ;  equation  (20), 


X  X  ^ 

last  bracket  should  be  —  —  — - 

a  3  rr* 


30 


y 


270,  1.  2,  for J  read  J  .  ^ 

285,  1.  2,  index  of  tq  is  —  r. 

292,  last  line,  for  xun 2  read  2un 2.  ^ 

304,  second  line  under  table,  for  x2  read  x2.  / 

320,  in  equation  for  e,  for  SW/(i  +  r)t)  read  S{W;(i  -f-  rjt)}.  ^ 

359,  replace  last  three  lines  by — 

“  In  expansion  of  left-hand  side,  the  coefficient  of  a2&  /32/  in 

ea x  +  pY  occurs  in  — - — -  ,  x  Mean  (aX  4-  (3Y)zk  +  2/,  and 
'  [2k  -j-  2.1)  ! 


equals  •  7  v  I<  ■ —  x  Mean  (Xzk  Y21),  where  k  and  l  are  any 
(2k)  !  (2I)  ! 

integers.  v 

360,  at  beginning  insert  “  Right-hand  side.”  ^ 

363,  1.  10,  after  ellipses  insert  and  on  either  side  of  =  — . 

(Ty  (TX 

366,  1.  19,  reference  is  to  p.  365,  not  p.  276.  ✓ 

373,  1.  1,  for  N  read  N3.  ✓ 

375,  1-  9,  for  1  read  —  1.  v 

414,  1.  2  from  foot,  for  px  read  p1  throughout.  V 
417,  1.  6,  read  “  m2  is  of  the  same  order  as  a-2.”  V 
420,  in  equation  (126)  after  p2,  for  np  —  i  read  fx2p  —  i.  y 
1.  2  from  foot,  for  cr  =  read  <rx  — .  ^ 

424,  1.  20,  in  denominator  of  coefficient  of  p2,  for  (1  —  r'2)  read 
(1  -  r'2)2.  / 

428,  last  line,  denominator  of  2 rexe2  is  crx<r2.  ^ 

431,  heading  of  second  column  of  table,  for  X  read  X2., 
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( References  to  definitions  are  printed  thus  : — 84.) 


Abstract,  Annual  of  Labour  Stat¬ 
istics,  n,  54,  163,  197 

—  Statistical  of,  United  Kingdom,  11 
Accuracy,  178  seq. 

Actuaries,  Text  Book  of,  241 
Ages,  26,  128,  130,  235 
Agricultural  Wages,  75-81,  84-7 
Arithmetic  Average  or  Mean,  82-3, 
84,  85-6,  104,  109  ;  error  in,  183 
Association,  370  ;  coefficient  of,  370 
Asymmetry,  116  ;  see  Skewness 
Attributes,  see  Characteristics 
Average,  errors  in,  183,  319,  415 
Averages,  67,  69,  70,  73,  82-109, 
1 1 7-124  ;  see  Arithmetic  Average, 
Geometric  Mean,  Median,  Mode, 
Weighted  Average 

Bernoulli’s  Laws,  273 
Bertillon,  J.,  14,  95,  141 
Biassed  Errors,  190-5 
Biometrika,  271,  369,  376,  409,  431, 
439 

Birth-rates,  95 

Blank  Forms  or  Schedules,  15-6, 
23-4,  28  ;  specimens  of,  22,  32-3, 
40,  45-6,  49 
Boole,  231,  241 

Booth,  C.,  10,  29,  30,  57-8,  141,  234, 
236 

Bortkiewicz,  L.  von,  285 
British  Association  Index-Number, 
207 

Brown,  W.,  369 

Budgets  of  expenditure,  209-10 
Burnett-Hurst,  A.  R.,  10 

Cartograms,  141 
Cave,  B.  M.,  376 
Cave,  F.  E.,  376 

Census  :  Population,  18,  20  seq., 

57-61  ;  householder’s  schedule,  22 

—  Production,  27,  51 

—  Wage,  12,  30  seq.,  70-5,  89 
Central  Differences,  228,  240 


Chance  and  Experience,  272-4  ;  see 
Probability 

Changes  in  Wages,  75-81 
Characteristics  or  Attributes,  19,  53, 
330 

Chauvenet,  241 
Chrystal,  G.,  436 
Coefficient  of  Association,  370 
Colligation,  370 
Contingency,  374 
Correlation,  353 
Regression,  362,  409 
Variation,  116 
Coefficients,  Statistical,  94-5 
Collection  of  material,  15 
Comparisons,  Accuracy  of,  193  ; 
errors  in,  326 

Comparisons  of  Series,  149  seq.,  172-6, 
374  seq. 

Compensating  Fluctuations,  148 
Consumption,  Index-number  of,  212 
Contingency,  371-3  ;  coefficient  of, 
374 

Correlation,  62,  350  seq.  ;  coefficient 
of,  353,  354  seq.,  366  ;  ratio,  366  ; 
surface,  356  seq.,  396-7  ;  of  time 
series,  374-8  ;  see  Partial  Correla¬ 
tion 

Curve  of  Error,  see  Error,  Law  of 
Curves  of  Frequency,  246-258 
Curve  of  Regression,  352 
Cycles  of  Trade,  148,  164 

Darwin,  G.  H.,  239 
De  Morgan,  230 
Death-rates,  95 
Deciles,  102,  105,  107 
Demography,  7,  20 
Density,  Greatest,  98  ;  see  Mode 
Deviations,  104,  110  ;  mean,  111, 
249  ;  see  Standard  Deviation 
Diagrams,  125-175 
Differences,  222  seq.  ;  significance  of, 
329-337 

Dispersion,  1 10-120 
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Earnings,  see  Wages 
Economist,  12,  205-6 
Edgeworth,  F.  Y.,  96,  169,  205,  236, 
252,  268,  295,  346,  358,  397,  409, 

4X4»  4*8 

Elderton,  W.  P.,  256,  344,  357,  368, 
373-4,  405 

Employment,  see  Unemployment 
Equation  of  Regression,  362-4,  400, 
405-6 

Error,  Absolute,  316,  317  ;  relative, 
180,  181  seq.,  318-322 
Error,  Curve  of  and  Law  of,  261  seq., 
290  seq.  ;  equation  of  curve,  268, 
295,  302,  44x  >’  and  diagram  at  end 
Error  of  mean  square,  see  Standard 
Deviation 

Everett,  J.  D.,  241 
Euler-Maclaurin  Theorem,  436 
Exports,  49,  134,  171,  173,  194 

Farr,  Dr.,  241 

Foreign  Trade,  43  seq.,  131,  134,  152, 
156,  170,  173,  201-3,  234 
Forms  of  Enquiry,  see  Blank  Forms 
Fox,  W.,  78 

French  Wage  Census,  37-8 
Frequency  Curves  and  Groups, 
246  seq. 

Gabaglio,  A.,  140 
Galton,  F.,  104,  106 
Geometric  Mean,  107,  205 
Gibson,  G.  A.,  434 
Giffen,  R.,  134-6 
Gini,  C.,  1 14-5 

Graphic  Method,  see  Diagrams  ;  of 
interpolation,  2 19-221 
Great  Numbers,  Law  of,  287  seq. 

Hardy,  G.  F.,  256,  344,  349 
Historical  Diagrams,  142  seq. 

Hooker,  R.,  375,  376 

Imports,  46,  171,  173,  387 
Incomes,  Pareto’s  Law  of,  346 
Index-numbers,  196-213  ;  of  con¬ 
sumption,  212  ;  of  wholesale  prices, 
20I-5,  324-6  ;  of  retail  prices,  200, 
208-212  ;  of  wages,  197,  212 
Interpolation,  214  seq. 

Inverse  Probability,  409  seq. 

Isserlis,  L.,  301 

Jevons,  W.  S.,  108.  159,  160 

"  Labour  Gazette,”  12,  220 
Lagrange’s  Interpolation  Formula, 
229,  235 

Least  Squares,  137,  239,  364,  378, 

452-4 
Le  Play,  7 


Levasseur,  140-1 
Levi,  Leone,  10 
Logarithmic  Curves,  169  seq. 
Logarithmic  Mean,  107 
Logarithms,  Table  of,  176-7 

Makeham’s  Formula,  349 
Marriage-rate,  95,  156,  174,  387 
Marshall,  A.,  8,  171 
Maximum  Ordinate,  98  ;  see  Mode 
Mean  Deviation,  111,  249,  269,  270 
Mean  Difference,  in,  114 
Means,  see  Arithmetic  Mean,  Logarith¬ 
mic  Mean,  Median,  Mode 
Median,  102,  103-7,  io9>  444  »  deter¬ 
mination  of,  106,  138,  227,  236 
Merrifield,  241 
Merriman,  454 

Method  of  Least  Squares,  137,  452-4 
Mitscherlich,  309 

Mode,  95-7,  98,  99-102,  109,  442  ; 
determination  of,  98,  139,  227-8, 

237 

Modulus,  252 

Moments,  250-8  ;  of  correlation 
surface,  361-2  ;  of  normal  curve, 
269,  441 

Moore,  H.  L.,  137,  163 
Mortara,  G.,  285 
Multinomial  Theorem,  292 
Multiple  Correlation,  403  seq. 

Newton’s  Interpolation  Formula, 
226 

Normal  Law  of  Error,  see  Error,  Law 

of 

Normal  Correlation  Surface,  361 

Occupation,  27-9,  61,  77 
Official  Statistics,  10 

Pareto,  V.,  346 

Partial  Correlation,  398  seq.  ;  coeffi¬ 
cient  of,  400 

Partial  Regression  Coefficients,  400 
Paupers,  55-6 

Pearson,  Karl,  5,  250,  252,  344,  357-8, 
365,  368,  373,  376,  405-6,  409,  423, 
427 

Percentiles,  102 

Periodic  Figures,  148,  159  seq.,  220, 
339-34x 

Persons,  W.  M.,  137,  373 
Population,  25,  see  Census 
Poynting,  J.  H.,  163 
Precision  of  Sums  and  Averages, 
312  seq.,  409  seq.  ;  of  average,  415  ; 
of  standard  deviation,  416 
Predominant  Value,  98  ;  see  Mode 
Prices,  see  Index-numbers 
Probable  Error,  113,  248,  270 
Probability,  259  seq.  ;  inverse,  409  seq 
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Product,  Error  in,  185,  318 
Purchasing  Power,  see  Index- 
numbers 

Quartile  Deviation,  112 
Quartiles,  102,  105,  107 
Questions,  15 
Quetelet,  96,  102,  255 
Quotient,  Error  in,  185,  318 

Random  Fluctuations,  148 
Ratio  of  Averages,  446-9  ;  error  in, 
187 

Registrar-General’s  Annual  Report, 

11 

Regression,  352,  362-3  ;  coefficient 
of,  362,  400  ;  equation  of,  362-4, 
400,  405-6 

Retail  Prices,  11  ;  see  Index-numbers 
Revenue,  142-5 
Rice,  241 

Rowntree,  B.  S.,  10,  41 

Samples,  17,  198,  206,  208 
Sampling,  277-284,  329-337 
Sauerbeck,  A.,  171,  205-6,  254 
Secrist,  H.,  142,  207 
Seligman,  C.  G.,  308 
Series,  137-8.  *48>  I53>  374 
Sheppard,  W.  F.,  241,  253,  271,  409, 
422,  450 

Sheppard's  Corrections,  253,  383,  439 
Skewness,  116,  119,  249,  251 
Small  Numbers,  Daw  of,  284 
Smoothing,  134,  137-8 
Snow,  E.  C.,  37 

Standard  Deviation,  112,  249,  251  ; 
of  average,  289,  342,  419  ;  of  corre¬ 
lation  coefficient,  422-4  ;  of  differ¬ 
ence,  288  ;  of  moments,  420  ; 
of  ratio  of  averages,  446  ;  of  ratio 
of  weighted  averages,  448  ;  of 
standard  deviation,  420  ;  precision 
of,  416 


Statistical  Coefficients,  94-5 
Statistical  Group,  110 
Statistics,  Definition  of,  3,  7,  82 
Stirling's  Formula,  267,  435 
Sum,  Error  in,  316 
Summary,  14,  16 

Tables  :  Logarithms,  176-7  ;  normal 
integral,  271  ;  second  approxima¬ 
tion,  303 

Tabulation,  14,  15,  52  seq. 

Tellers,  4,  23,  25,  28 
Todhunter,  413 
Trade  Unions,  60 
Trend,  137,  337-9 

Unbiassed  Errors,  190-5 
Undulatory  Fluctuations,  148 
Unemployment,  36-7,  160  seq.,  174 
Unit,  18  seq. 

Variate  Difference  Correlation, 
376,  388 

Wage  Census,  see  Census 
Wage  Statistics,  63-79,  89,  92,  97,  118 
Wallis’s  Theorem,  264,  434 
Weighted  Average,  86-7,  88,  89-94, 
200,  448  ;  error  in,  184-5,  i87* 
195.  3T6,  320  ;  ratio  of,  187,  327, 
448 

Wheat  Statistics,  146-8,  156 
Whitworth,  246 

Wholesale  Prices,  see  Index-numbers 

Weld,  L.  D.,  454 

Willcox,  W.  F.,  26 

Wood,  G.  H.,  174,  212,  328 

Woolhouse,  241 

Yule,  U.,  104,  252,  332,  336,  364, 
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